GSE173508 Processing Pipeline
Publication
Discovery and functional interrogation of SARS-CoV-2 protein-RNA interactions.Research square (2022) — PMID 35313591
Dataset
GSE173508Discovery and functional interrogation of the virus and host RNA interactome of SARS-CoV-2 proteins
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).
UMI-tools v1.0.0$ Bash example
# Install UMI-tools (if not already installed) # conda install -c bioconda umi-tools=1.0.0 # Reformat sequenced reads to extract Unique Molecular Identifiers (UMIs) and include them in read headers. # This command assumes UMIs are at the start of Read 1 and are 6 base pairs long. # Adjust the --bc-pattern according to the actual UMI structure and position in your reads. # For example: # - If the UMI is 6bp at the start of Read 1: --bc-pattern=NNNNNN # - If the UMI is 6bp at the start of Read 2 (for paired-end data): --bc-pattern=NNNNNN --read2-in=read2.fastq.gz --read2-out=read2_umi_extracted.fastq.gz # - If the UMI is in an index read: --bc-pattern=NNNNNN --index-in=index.fastq.gz umi_tools extract \ --input input_reads.fastq.gz \ --output output_reads_with_umi_in_header.fastq.gz \ --bc-pattern=NNNNNN \ --log umi_tools_extract.log -
2
Args: --random-seed 1 --bc-pattern NNNNNNNNNN
$ Bash example
# Install umi_tools via conda # conda install -c bioconda umi_tools # Example usage of umi_tools extract for processing Unique Molecular Identifiers (UMIs). # Replace input.fastq.gz and output.fastq.gz with your actual input and output file paths. # The --bc-pattern NNNNNNNNNN specifies a 10-nucleotide barcode at the start of the read. # The --random-seed 1 ensures reproducibility for any random processes within umi_tools. umi_tools extract --random-seed 1 --bc-pattern NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz
-
3
Reads were then trimmed with cutadapt (1.14).
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=1.14 # Define input and output files (placeholders) INPUT_FASTQ="input.fastq.gz" OUTPUT_FASTQ="trimmed.fastq.gz" # Define common adapter sequence (e.g., Illumina universal adapter) ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Define trimming parameters MINIMUM_LENGTH=15 # Minimum read length after trimming QUALITY_CUTOFF=20 # Quality cutoff for 3' end trimming NUM_CORES=4 # Number of cores for parallel processing # Execute cutadapt command cutadapt \ -a "${ADAPTER_SEQUENCE}" \ -o "${OUTPUT_FASTQ}" \ -m ${MINIMUM_LENGTH} \ -q ${QUALITY_CUTOFF} \ --cores ${NUM_CORES} \ "${INPUT_FASTQ}" -
4
Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
$ Bash example
# Install STAR aligner (if not already installed) # conda install -c bioconda star # Clone the eCLIP repository (if not already cloned) # git clone https://github.com/YeoLab/eclip.git # cd eclip # export PATH=$(pwd)/src:$PATH # Add eclip scripts to PATH # Define input/output and reference paths (placeholders - replace with actual paths) INPUT_FASTQ="path/to/your/input_reads.fastq.gz" # Example: reads.fastq.gz or R1.fastq.gz R2.fastq.gz for paired-end OUTPUT_PREFIX="eclip_alignment_output" # Prefix for output files GENOME_DIR="path/to/STAR_index/hg38" # Path to the STAR genome index (e.g., for human hg38) # Download InvRNA*.fasta files (if not already present) # mkdir -p example_inputs # for i in $(seq 1 1560); do # wget -P example_inputs https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_"$i".fasta # done # ANNOTATION_FASTA_FILES="example_inputs/InvRNA_*.fasta" # For demonstration, assuming InvRNA*.fasta are in the current directory or specified path ANNOTATION_FASTA_FILES="InvRNA*.fasta" # Adjust path if files are elsewhere # Execute the eclip_align_star.py script python eclip_align_star.py \ -i "${INPUT_FASTQ}" \ -o "${OUTPUT_PREFIX}" \ -g "${GENOME_DIR}" \ --match-read-wildcards \ -O 1 \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -a ${ANNOTATION_FASTA_FILES} -
5
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=1.14 # Define input and output file paths # Assuming single-end reads for this example. Adjust for paired-end if necessary. INPUT_READS="input_reads_previously_trimmed.fastq.gz" OUTPUT_TRIMMED_READS="reads_trimmed_double_ligation.fastq.gz" # Define the 3' adapter sequence commonly associated with double-ligation events in eCLIP. # This is a placeholder; the exact adapter might vary based on the specific library preparation # and the nature of the double-ligation event. A common eCLIP 3' adapter is used here. # Example eCLIP 3' adapter from Yeo lab protocols: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Trim reads with cutadapt to remove the 3' adapter (double-ligation events). # The description implies a re-trimming step, so the input reads are already partially processed. # -a: 3' adapter sequence to be removed from the 3' end of the reads. # -o: Output file for trimmed reads. # -m 18: Discard reads shorter than 18 bp after trimming (a common minimum length for eCLIP reads). # -q 20,20: Trim low-quality bases from the 5' and 3' ends with a quality threshold of 20. # --cores=4: Use 4 CPU cores for parallel processing (adjust based on available resources). cutadapt -a "${ADAPTER_SEQUENCE}" \ -o "${OUTPUT_TRIMMED_READS}" \ -m 18 \ -q 20,20 \ --cores=4 \ "${INPUT_READS}" -
6
Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools # Reference dataset: InvRNA_adapters.fasta # This file (e.g., InvRNA_adapters.fasta from https://github.com/YeoLab/eclip/tree/master/example/inputs/) # typically contains adapter sequences used for trimming reads, often with tools like cutadapt, # which usually precedes UMI deduplication. The '-a InvRNA*.fasta' argument in the description # is characteristic of adapter trimming, but the other arguments are specific to umi_tools dedup. # For this step, we assume the primary operation is UMI deduplication. # Placeholder for input BAM file (e.g., from alignment and adapter trimming step) # The input BAM should be sorted by coordinate for umi_tools dedup. INPUT_BAM="input_aligned_trimmed.bam" OUTPUT_BAM="output_deduplicated.bam" umi_tools dedup \ --match-read-wildcards \ -O 5 \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -I "${INPUT_BAM}" \ -S "${OUTPUT_BAM}" -
7
Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0i # conda install -c bioconda samtools # For indexing BAM files # --- Reference Data Setup --- # Placeholder for RepBase 18.05 FASTA file. # Actual download might require a license or specific access from GIRINST. # For demonstration, we assume 'RepBase18.05.fasta' is available in the working directory. # Example download (replace with actual source if available and licensed): # wget -O RepBase18.05.fasta.gz "http://www.girinst.org/repbase/update/RepBase18.05.fasta.gz" # gunzip RepBase18.05.fasta.gz repbase_fasta="RepBase18.05.fasta" genomeDir="RepBase_STAR_index" # Create STAR index for RepBase 18.05 # The --genomeSAindexNbases parameter is adjusted for shorter sequences typical of repeat elements. # --runThreadN: Use multiple threads for indexing to speed up the process. mkdir -p "${genomeDir}" STAR --runMode genomeGenerate \ --genomeDir "${genomeDir}" \ --genomeFastaFiles "${repbase_fasta}" \ --genomeSAindexNbases 10 \ --runThreadN 8 # --- Alignment Step --- # Input: Trimmed reads (placeholder). Replace with your actual trimmed reads file. trimmed_reads="trimmed_reads.fastq.gz" # Output prefix for STAR alignment files output_prefix="aligned_to_repeats" # Align trimmed reads to the RepBase 18.05 index # --outSAMtype BAM SortedByCoordinate: Output sorted BAM file. # --outReadsUnmapped Fastx: Output unmapped reads to a separate FASTQ file. # --outFilterMultimapNmax 100: Allow reads to map to up to 100 locations, which is common for repetitive sequences. # --alignIntronMax 1: Set to 1 to effectively disable splicing, as repeats are not typically spliced regions. # --runThreadN: Use multiple threads for alignment. STAR --runMode alignReads \ --genomeDir "${genomeDir}" \ --readFilesIn "${trimmed_reads}" \ --outFileNamePrefix "${output_prefix}" \ --outSAMtype BAM SortedByCoordinate \ --outReadsUnmapped Fastx \ --outFilterMultimapNmax 100 \ --alignIntronMax 1 \ --runThreadN 8 # Index the resulting BAM file for easier viewing and downstream processing samtools index "${output_prefix}Aligned.sortedByCoord.out.bam" -
8
Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Note: 'human_repbase' refers to a pre-indexed STAR genome directory. # If this is a custom index including repetitive elements, ensure it's generated correctly. # For a standard human genome (e.g., hg38), you would first generate the genome index: # STAR --runThreadN 16 --runMode genomeGenerate --genomeDir /path/to/STAR_index/hg38 --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf # Align reads using STAR STAR \ --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
9
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2).
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Define variables READS_FILE="filtered_unmapped_reads.fastq.gz" # Placeholder for input reads (e.g., from a previous filtering step) GENOME_DIR="/path/to/STAR_genome_index/hg19" # Placeholder for hg19 STAR genome index directory OUTPUT_PREFIX="mapped_reads_" # Prefix for output files (e.g., mapped_reads_Aligned.sortedByCoord.out.bam) NUM_THREADS=8 # Example number of threads, adjust based on available CPU cores # Run STAR alignment STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_FILE}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 \ --limitBAMsortRAM 30000000000 # Example: 30GB, adjust based on available RAM and expected BAM file size -
10
Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Define variables GENOME_DIR="/path/to/STAR_genome_index_GRCh38" # Placeholder for GRCh38 genome index READ1_FILE="/path/to/sample_R1.fastq.gz" # Placeholder for input read 1 file OUTPUT_PREFIX="sample_aligned" # Placeholder for output file prefix # Execute STAR alignment STAR \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1_FILE}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
11
Aligned reads were sorted with samtools (1.6)
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools=1.6 # Sort an aligned BAM file # Replace input.bam with your actual input file # Replace output.sorted.bam with your desired output file name samtools sort -o output.sorted.bam input.bam
-
12
Sorted reads were collapsed with umi_tools (1.0.0).
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools=1.0.0 # Placeholder for input sorted BAM file INPUT_SORTED_BAM="input.sorted.bam" # Placeholder for output deduplicated BAM file OUTPUT_DEDUPLICATED_BAM="output.deduplicated.bam" # Collapse sorted reads using umi_tools dedup. # This command assumes UMIs are present in the read names and uses the 'directional' method # for deduplication, which is common for RNA-seq data to preserve strand information. # Adjust --umi-separator or --extract-umi-method/--umi-tag if UMIs are stored differently # (e.g., in a BAM tag). umi_tools dedup \ --input "${INPUT_SORTED_BAM}" \ --output "${OUTPUT_DEDUPLICATED_BAM}" \ --method directional \ --log "umi_tools_dedup.log" -
13
Args: --random-seed 1 --method unique
Generic Data Processing Script (Inferred with models/gemini-2.5-flash) vUnknown$ Bash example
# This is a placeholder command as the specific tool and context are not provided. # It is assumed to be a custom script or a generic data processing step. # Replace 'my_tool_or_script.sh' with the actual executable name and specify input/output files if applicable. my_tool_or_script.sh --random-seed 1 --method unique
-
14
BAM files were used to identify peak clusters with Clipper (1.2.2).
$ Bash example
# Install CLIPper (if not already installed) # pip install clipper # Example: Download a genome size file (e.g., for hg38) if not already present # wget -O hg38.chrom.sizes http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes # Run CLIPper to identify peak clusters from BAM files # Replace 'input.bam' with your actual BAM file(s) # Replace 'hg38.chrom.sizes' with the appropriate genome size file for your reference genome # The p-value threshold (-p) is a common parameter; adjust as needed. clipper.py -b input.bam -s hg38.chrom.sizes -o output_peaks.bed -p 0.01
-
15
Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
$ Bash example
# Define input and output paths INPUT_BAM="path/to/input.bam" OUTPUT_BAM="path/to/output.bam" # Define species (reference genome) # hg19 is a human reference genome, commonly available from UCSC or NCBI. # ChlSab2_Sars is likely a specific reference, possibly for SARS-CoV-2 on Chlorocebus sabaeus (green monkey). SPECIES="hg19" # Define other parameters TIMEOUT_MS=3600000 # 1 hour in milliseconds MAX_GENES=1000000 # Assuming a custom Python script based on the arguments like --save-pickle and --maxgenes, # which are not typical for standard bioinformatics tools. The actual script name # would need to be known from the pipeline context. For demonstration, we'll use # a placeholder 'process_bam_genes.py'. This script would likely require a Python # environment with necessary libraries (e.g., pysam). # Example installation (commented out) # conda create -n myenv python=3.8 pysam # conda activate myenv # Execute the inferred custom Python script python process_bam_genes.py \ --species "${SPECIES}" \ --bam "${INPUT_BAM}" \ --timeout "${TIMEOUT_MS}" \ --maxgenes "${MAX_GENES}" \ --save-pickle \ --outfile "${OUTPUT_BAM}" -
16
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
$ Bash example
# Clone the eclip repository if not already available # git clone https://github.com/yeolab/eclip.git # export PATH=$PATH:$(pwd)/eclip/bin # Example input files (replace with actual paths) # PEAK_FILE should be a peak file with read counts, typically generated by a script like overlap_peakfi_with_bam_PE.pl # (as mentioned in the description) or another read counting tool. PEAK_FILE="peaks_with_counts.bed" IP_BAM="ip.bam" INPUT_BAM="input.bam" OUTPUT_PREFIX="normalized_peaks" # Run peaksnormalize.pl # This script normalizes peak clusters using IP and INPUT BAM files. peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}" -
17
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
$ Bash example
# Install Perl if not available # conda install -c bioconda perl # Clone the eCLIP pipeline repository to get the script # git clone https://github.com/yeolab/eclip.git # cd eclip/scripts # Define input and output file names (replace with actual paths and names) # These would be the normalized peak regions from individual replicates INPUT_PEAK_FILE_REP1="replicate1_normalized_peaks.bed" INPUT_PEAK_FILE_REP2="replicate2_normalized_peaks.bed" # Add more input files if there are more replicates OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed" # Execute the merging script # Assuming the script 'compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl' is in the current PATH or specified with its full path perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \ "${INPUT_PEAK_FILE_REP1}" \ "${INPUT_PEAK_FILE_REP2}" \ > "${OUTPUT_MERGED_PEAKS}" -
18
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
$ Bash example
# Install merge_peaks pipeline (which includes make_informationcontent_from_peaks.pl and a custom IDR script) # git clone https://github.com/yeolab/merge_peaks.git # export PATH=$PATH:$(pwd)/merge_peaks/bin # Example input files (normalized and compressed .bed files) # Assuming 'compressed.bed' refers to standard BED files that might have been gzipped and unzipped, # or simply a naming convention for processed BED files. INPUT_REP1="replicate1.bed" INPUT_REP2="replicate2.bed" OUTPUT_PREFIX="idr_output" # Step 1: Rank peaks by entropy score (information content) # make_informationcontent_from_peaks.pl is part of the merge_peaks pipeline. # It takes a BED file and outputs a new BED file with an added 'signal.value' column # representing the information content, which IDR uses for ranking. make_informationcontent_from_peaks.pl "${INPUT_REP1}" > "${INPUT_REP1%.bed}.ranked.bed" make_informationcontent_from_peaks.pl "${INPUT_REP2}" > "${INPUT_REP2%.bed}.ranked.bed" # Step 2: Use ranked peaks as inputs to IDR (version 2.0.2 logic, as implemented in merge_peaks) # The 'idr' script used here is from the merge_peaks pipeline (merge_peaks/bin/idr). # It implements the IDR 2.0.2 logic for determining reproducible peaks. idr --input-file-type bed \ --rank-by signal.value \ --output-file "${OUTPUT_PREFIX}.idr.bed" \ "${INPUT_REP1%.bed}.ranked.bed" \ "${INPUT_REP2%.bed}.ranked.bed" -
19
Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.
$ Bash example
# Install bedtools if not already installed # conda install -c bioconda bedtools # Filter peaks for length >= 20 bases awk 'BEGIN{OFS="\t"} {if ($3 - $2 >= 20) print}' reproducible_peaks.bed > reproducible_peaks_length_filtered.bed # Filter out peaks overlapping with WT negative control samples bedtools subtract -a reproducible_peaks_length_filtered.bed -b wt_negative_control.bed > final_filtered_peaks.bed
Raw Source Text
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/) Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/) Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05). Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2). Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Aligned reads were sorted with samtools (1.6) Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+. Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+ Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks. Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples. Genome_build: hg19 Genome_build: ChlSab2 Genome_build: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome (MN908947.3)