GSE173498 Processing Pipeline
Publication
Discovery and functional interrogation of SARS-CoV-2 protein-RNA interactions.Research square (2022) — PMID 35313591
Dataset
GSE173498Discovery and functional interrogation of the virus and host RNA interactome of SARS-CoV-2 proteins [eCLIP]
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0).
$ Bash example
# Install UMI-tools if not already installed # conda install -c bioconda umi_tools=1.0.0 # Placeholder for input and output files # Replace 'input.fastq.gz' with your actual input FASTQ file containing UMIs. # Replace 'output.fastq.gz' with your desired output FASTQ file where UMIs are moved to headers. # Replace 'NNNNNNNNNN' with the actual UMI barcode pattern. # For example, if a 10bp UMI is at the start of Read 1, use '--bc-pattern="^(?P<umi_1>.{10})"'. # If the UMI is in a separate index read, the command structure will be different, # potentially involving '--extract-method=tag' and multiple input files. # This command assumes an inline UMI in the primary input FASTQ file. umi_tools extract --bc-pattern=NNNNNNNNNN -I input.fastq.gz -S output.fastq.gz --log=umi_tools_extract.log -
2
Args: --random-seed 1 --bc-pattern NNNNNNNNNN
$ Bash example
# Install Miniconda or Anaconda if not already installed # wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh # bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda # export PATH="$HOME/miniconda/bin:$PATH" # Clone the skipper repository # git clone https://github.com/yeolab/skipper.git # cd skipper # Create and activate the conda environment for skipper # conda env create -f environment.yaml # conda activate skipper # Example usage of demultiplex_fastq.py # Assuming input_read1.fastq.gz and input_read2.fastq.gz are your input files # and you are in the 'skipper' directory after cloning. # The script will output files like demultiplexed_output_prefix_barcode1.fastq.gz, etc. python scripts/demultiplex_fastq.py \ --random-seed 1 \ --bc-pattern NNNNNNNNNN \ -i input_read1.fastq.gz input_read2.fastq.gz \ -o demultiplexed_output_prefix
-
3
Reads were then trimmed with cutadapt (1.14).
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=1.14 # Define input and output files (placeholders) INPUT_READS="input_reads.fastq.gz" TRIMMED_READS="trimmed_reads.fastq.gz" # Define adapter sequence (replace with actual adapter sequence if known) # Example: Illumina universal adapter ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Execute cutadapt for trimming # -a: 3' adapter sequence to remove # -q 20,20: Trim low-quality bases from both ends with a quality threshold of 20 # -m 20: Discard reads shorter than 20 bp after trimming # -o: Output file for trimmed reads cutadapt -a "${ADAPTER_SEQUENCE}" -q 20,20 -m 20 -o "${TRIMMED_READS}" "${INPUT_READS}" -
4
Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # Download the adapter sequence file if not present # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA.fasta # Execute cutadapt for adapter trimming and quality filtering cutadapt \ --match-read-wildcards \ -O 1 \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -a file:InvRNA.fasta \ -o trimmed_reads.fastq.gz \ input_reads.fastq.gz
-
5
Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events.
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=1.14 # Define input and output file paths (placeholders) INPUT_READ1="reads_R1.fastq.gz" INPUT_READ2="reads_R2.fastq.gz" OUTPUT_READ1="trimmed_reads_R1.fastq.gz" OUTPUT_READ2="trimmed_reads_R2.fastq.gz" # Define adapter sequences (placeholders for common Illumina adapters) # These sequences should be replaced with the actual adapters used in the experiment # For double-ligation events, it's common to trim the sequencing adapter itself. ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Execute cutadapt to remove double-ligation events (adapter sequences) # -a for 3' adapter of read 1, -A for 3' adapter of read 2 # --minimum-length is often used to discard very short reads after trimming cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \ -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \ --minimum-length 18 \ "${INPUT_READ1}" "${INPUT_READ2}" -
6
Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/)
$ Bash example
# Install clipper (if not already installed) # conda create -n clipper_env python=3.8 # conda activate clipper_env # pip install clipper # Download reference annotation file (assuming hg19, adjust if mm10 is needed) # wget https://raw.githubusercontent.com/YeoLab/eclip/master/example/inputs/InvRNA_hg19.fasta # Placeholder for input BAM files. Replace with actual paths to your treated and control BAMs. TREATED_BAM="treated.bam" CONTROL_BAM="control.bam" OUTPUT_DIR="clipper_output" # Create output directory mkdir -p "${OUTPUT_DIR}" # Run clipper for peak calling clipper \ --match-read-wildcards \ -O "${OUTPUT_DIR}" \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -a InvRNA_hg19.fasta \ "${TREATED_BAM}" \ "${CONTROL_BAM}" -
7
Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0i # Define variables STAR_VERSION="2.4.0i" REPBASE_FASTA="repbase_18.05.fasta" # Placeholder for the RepBase 18.05 FASTA file. Obtain from RepBase (e.g., http://www.girinst.org/repbase/update/index.html) GENOME_DIR="STAR_RepBase_index" TRIMMED_READS="trimmed_reads.fastq.gz" # Placeholder for trimmed reads (e.g., output from a trimming step) OUTPUT_PREFIX="repbase_mapping" # 1. Create STAR genome index for RepBase 18.05 # This step assumes you have the RepBase 18.05 FASTA file. # For mapping against a repeat database, a GTF/GFF is typically not used, and splicing is disabled. mkdir -p "${GENOME_DIR}" STAR --runMode genomeGenerate \ --genomeDir "${GENOME_DIR}" \ --genomeFastaFiles "${REPBASE_FASTA}" \ --runThreadN 8 # Adjust threads as needed # 2. Map trimmed reads to the RepBase index STAR --version # To confirm the version used STAR --runMode alignReads \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${TRIMMED_READS}" \ --runThreadN 8 \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outFilterMultimapNmax 100 \ --outFilterMismatchNmax 10 \ --alignIntronMax 1 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Adjust RAM based on available resources (e.g., 30GB) # Optional: Index the resulting BAM file samtools index "${OUTPUT_PREFIX}_Aligned.sortedByCoordinate.bam" -
8
Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Note: The 'human_repbase' directory must contain a pre-built STAR genome index. # This index would typically be generated using STAR's genomeGenerate command, # potentially including repetitive element sequences if 'repbase' implies that. # Example of STAR index generation (not part of this step): # STAR --runThreadN <threads> --runMode genomeGenerate --genomeDir human_repbase \ # --genomeFastaFiles /path/to/human_genome.fa /path/to/repbase_sequences.fa \ # --sjdbGTFfile /path/to/annotations.gtf # if applicable # Placeholder for input reads. The description 'readFilesIn path/to/read1' suggests a single input file. # If paired-end, the argument would typically be '--readFilesIn path/to/read1 path/to/read2'. # cp /path/to/your/actual_read_file.fastq.gz path/to/read1 # Example of placing input file STAR --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
9
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0i # Define variables # Replace with actual path to STAR genome index. # If mapping against a combined hg19/ChlSab2 genome, the index should be built from both fasta files. GENOME_DIR="/path/to/STAR_genome_index/hg19_ChlSab2" # Example: /path/to/STAR_genome_index/hg19_ChlSab2 INPUT_FASTQ="filtered_unmapped_reads.fastq.gz" # Replace with your input FASTQ file OUTPUT_PREFIX="aligned_reads" THREADS=8 # Adjust as needed # Example for creating a STAR genome index for hg19 and ChlSab2 (run once) # Ensure you have the fasta files for hg19 (e.g., from UCSC) and ChlSab2 (e.g., from NCBI/Ensembl), # and optionally a GTF for hg19 (e.g., from GENCODE or UCSC). # STAR --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/hg19.fa /path/to/ChlSab2.fa \ # --sjdbGTFfile /path/to/hg19.gtf \ # --runThreadN ${THREADS} # Map reads with STAR STAR --runMode alignReads \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${INPUT_FASTQ} \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outFilterMultimapNmax 20 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --runThreadN ${THREADS} -
10
Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Placeholder for genome directory (e.g., hg38, mm10) # This directory should contain the STAR genome index generated with STAR --runMode genomeGenerate GENOME_DIR="genomedir" # Replace with actual path to STAR genome index # Placeholder for input FASTQ file(s) READ_FILE_1="/path/to/read1.fastq.gz" # Replace with actual path to your read 1 file # If paired-end, use: READ_FILE_2="/path/to/read2.fastq.gz" # Placeholder for output prefix OUT_PREFIX="out_prefix" # Replace with desired output file prefix STAR \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ_FILE_1}" \ --outFileNamePrefix "${OUT_PREFIX}" \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
11
Aligned reads were sorted with samtools (1.6)
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools=1.6 # Sort aligned reads (BAM format) by coordinate # Input: aligned_reads.bam # Output: aligned_reads.sorted.bam samtools sort -o aligned_reads.sorted.bam aligned_reads.bam
-
12
Sorted reads were collapsed with umi_tools (1.0.0).
$ Bash example
# Install umi_tools if not already installed # conda create -n umi_tools_env umi_tools=1.0.0 -c bioconda -y # conda activate umi_tools_env # Define input and output files INPUT_BAM="sorted_reads.bam" OUTPUT_DEDUP_BAM="collapsed_reads.dedup.bam" OUTPUT_STATS="deduplication_stats.txt" # Collapse sorted reads using umi_tools dedup # Assuming UMIs are in the read ID (default behavior if not specified otherwise). # Using 'directional' method for deduplication, which is robust for many applications. # If reads are paired-end, add --paired. umi_tools dedup \ --input "${INPUT_BAM}" \ --output "${OUTPUT_DEDUP_BAM}" \ --method "directional" \ --output-stats "${OUTPUT_STATS}" \ --log "umi_tools_dedup.log" -
13
Args: --random-seed 1 --method unique
Custom Data Processing Script (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# This command represents a generic data processing step. # The specific tool is not explicitly stated in the description. process_data --random-seed 1 --method unique
-
14
BAM files were used to identify peak clusters with Clipper (1.2.2).
$ Bash example
# Install CLIPper (if not already installed) # pip install clipper # Placeholder for genome size file (e.g., for human hg38) # Replace with the actual path to your genome size file, or generate one using samtools faidx GENOME_SIZE_FILE="/path/to/hg38.chrom.sizes" # Input BAM file(s) # The description mentions "BAM files" (plural), implying one or more input BAMs. # For a single run, we'll use a placeholder for one input BAM. INPUT_BAM="input.bam" # Output peak file OUTPUT_BED="peaks.bed" # Run CLIPper to identify peak clusters # This is a basic command. Specific parameters like -p (p-value), -f (fold-change), # -c (control BAM), -u (upstream extension), -d (downstream extension), etc., # would be added based on the specific experimental design and desired stringency. clipper.py -g "${GENOME_SIZE_FILE}" -o "${OUTPUT_BED}" "${INPUT_BAM}" -
15
Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
Python script for gene feature extraction (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash)$ Bash example
bash # It is assumed that Python and necessary libraries (e.g., pandas, numpy, pysam if processing BAMs) are installed. # Example: # conda create -n myenv python=3.9 # conda activate myenv # pip install pandas numpy pysam # Placeholder for the inferred Python script. # Replace 'python_script.py' with the actual script name if known. # Replace 'path/to/input.bam' and 'path/to/output.bam' with actual file paths. python python_script.py \ --species hg19 \ --bam path/to/input.bam \ --timeout 3600000 \ --maxgenes 1000000 \ --save-pickle \ --outfile path/to/output.bam -
16
Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
$ Bash example
# Clone the eclip repository if not already available # git clone https://github.com/yeolab/eclip.git # export PATH=$PATH:/path/to/eclip/bin # Ensure Perl and required modules are installed (e.g., Bio::DB::Sam) # Define input files (placeholders) PEAK_FILE="peaks.bed" # Example: output from a peak caller like CLIPper IP_BAM="ip_replicate1.bam" # BAM file for IP sample INPUT_BAM="input_replicate1.bam" # BAM file for INPUT sample OUTPUT_PREFIX="normalized_peaks" # Normalize peak clusters using peaksnormalize.pl peaksnormalize.pl "${PEAK_FILE}" "${IP_BAM}" "${INPUT_BAM}" "${OUTPUT_PREFIX}" -
17
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
$ Bash example
# Clone the eCLIP repository if not already available, or ensure the script is in your PATH. # git clone https://github.com/yeolab/eclip.git # cd eclip # Assuming the script is located in a 'scripts' directory within the cloned eCLIP repository # or is otherwise accessible in your environment. Adjust the path as necessary. ECLIP_SCRIPTS_DIR="path/to/eclip/scripts" # Replace with the actual path to the eCLIP scripts directory # Placeholder for input normalized peak regions (BED format) from replicates. # These files would be the output from a previous peak calling and normalization step. INPUT_PEAKS_REP1="normalized_replicate1_peaks.bed" INPUT_PEAKS_REP2="normalized_replicate2_peaks.bed" # Add more input files for additional replicates as needed, e.g., INPUT_PEAKS_REP3="normalized_replicate3_peaks.bed" # Output file for the merged peak regions OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed" # Execute the merging script. # The script 'compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl' # likely takes multiple input BED files (representing normalized peak regions from replicates) # and merges them based on overlap and L2 fold enrichment criteria, outputting a single BED file. # Specific parameters for L2 fold enrichment or overlap thresholds are not provided in the description, # so a generic call is used here, assuming it takes input files as positional arguments. perl "${ECLIP_SCRIPTS_DIR}/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl" \ "${INPUT_PEAKS_REP1}" \ "${INPUT_PEAKS_REP2}" \ > "${OUTPUT_MERGED_PEAKS}" -
18
Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
$ Bash example
# Install IDR (e.g., via conda) # conda install -c bioconda idr=2.0.2 # Install merge_peaks (assuming the scripts are accessible, e.g., cloned or in PATH) # git clone https://github.com/yeolab/merge_peaks.git # export PATH=$PATH:/path/to/merge_peaks/scripts # Placeholder for input normalized peak files (e.g., from two replicates) # These files are assumed to be in compressed.bed format as per the description. # Replace with actual file paths. INPUT_REP1_BED="replicate1.compressed.bed" INPUT_REP2_BED="replicate2.compressed.bed" # Output files after ranking by entropy score RANKED_REP1_BED="replicate1.ranked.bed" RANKED_REP2_BED="replicate2.ranked.bed" # Output prefix for IDR results IDR_OUTPUT_PREFIX="idr_reproducible_peaks" # Step 1: Rank normalized peak files by entropy score using make_informationcontent_from_peaks.pl # This script is included within the merge_peaks pipeline. perl make_informationcontent_from_peaks.pl "${INPUT_REP1_BED}" "${RANKED_REP1_BED}" perl make_informationcontent_from_peaks.pl "${INPUT_REP2_BED}" "${RANKED_REP2_BED}" # Step 2: Run IDR (2.0.2) to determine reproducible peaks # A common rank threshold (e.g., 0.01) is used as it's not specified in the description. idr --samples "${RANKED_REP1_BED}" "${RANKED_REP2_BED}" --output-file "${IDR_OUTPUT_PREFIX}" --rank-threshold 0.01 -
19
Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples.
$ Bash example
# Install merge_peaks (if not already installed) # git clone https://github.com/yeolab/merge_peaks.git # # Navigate into the cloned directory if needed, or adjust path # # cd merge_peaks # # Ensure Python environment is set up (e.g., with conda) # # conda create -n merge_peaks_env python=3.8 # # conda activate merge_peaks_env # # pip install -r requirements.txt # if a requirements.txt exists # Execute filter_peaks.py # Replace '/path/to/merge_peaks' with the actual path to the cloned repository's root where filter_peaks.py resides. # Replace 'merged_reproducible_peaks.bed' with the actual input file containing reproducible peaks. # Replace 'WT_negative_control.bed' with the actual negative control peak file (e.g., a blacklist file). python /path/to/merge_peaks/filter_peaks.py \ --input merged_reproducible_peaks.bed \ --output filtered_reproducible_peaks.bed \ --min-length 20 \ --blacklist WT_negative_control.bed
Raw Source Text
Sequenced reads were reformatted to include randomers in read headers with umi_tools (1.0.0). Args: --random-seed 1 --bc-pattern NNNNNNNNNN Reads were then trimmed with cutadapt (1.14). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/) Reads were then trimmed once more with cutadapt (1.14) to remove double-ligation events. Args: --match-read-wildcards -O 5 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -a InvRNA*.fasta (fasta sequences can be found at: https://github.com/YeoLab/eclip/tree/master/example/inputs/) Trimmed reads were then mapped with STAR (2.4.0i) against a repeat element database (RepBase 18.05). Args: --runThreadN 16 \ --genomeDir human_repbase \ --readFilesIn path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a human genome (hg19/ChlSab2). Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Aligned reads were sorted with samtools (1.6) Sorted reads were collapsed with umi_tools (1.0.0). Args: --random-seed 1 --method unique BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species (hg19/ChlSab2_Sars) --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam Peak clusters were normalized using BAM files for IP against BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+. Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+ Normalized peak (compressed.bed) files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks. Reproducible peaks were filtered for those â¥20 bases in length, and not overlapping with WT negative control samples. Genome_build: hg19 Genome_build: ChlSab2 Genome_build: Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome (MN908947.3)