GSE124023 Processing Pipeline
Publication
DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.Life science alliance (2020) — PMID 32817263
Processing Steps
Generate Jupyter Notebook-
1
Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1).
$ Bash example
# Install eclipdemux (assuming it's a Python package, often part of a larger workflow like yeolab/eclip or yeolab/skipper) # pip install eclipdemux # Define input and output files (placeholders) INPUT_FASTQ="raw_reads.fastq.gz" OUTPUT_FASTQ="demuxed_reads.fastq.gz" BARCODE_FILE="inline_barcodes.txt" # Placeholder: file defining inline barcodes to be removed RANDOMER_LENGTH=10 # Placeholder: length of randomers to be added to read headers # Execute eclipdemux to remove inline barcodes and add randomers to read headers # The exact command might vary slightly depending on how eclipdemux is installed (e.g., as a standalone script or a Python module). # This example assumes it's run as a Python module, as seen in some Yeo lab pipelines. python -m eclipdemux.demux -i "${INPUT_FASTQ}" -o "${OUTPUT_FASTQ}" -b "${BARCODE_FILE}" -r "${RANDOMER_LENGTH}" -
2
Args: --length 10
N/A (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
echo "A specific tool cannot be inferred from 'Args: --length 10' without additional context (e.g., assay type, specific task). This argument could be used in various tools for filtering, k-mer length, motif length, etc." # Example placeholder if a tool were known: # some_tool --length 10 input.file > output.file
-
3
Reads were then trimmed with cutadapt (1.9.1).
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=1.9.1 # Define input and output file names INPUT_READ1="input_R1.fastq.gz" INPUT_READ2="input_R2.fastq.gz" OUTPUT_READ1="trimmed_R1.fastq.gz" OUTPUT_READ2="trimmed_R2.fastq.gz" # Define common Illumina adapter sequences (adjust if specific adapters are known) # For Read 1 (forward strand) ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # For Read 2 (reverse complement of Read 1 adapter, or specific R2 adapter) ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Trim adapters and quality trim reads # -a: 3' adapter for Read 1 # -A: 3' adapter for Read 2 # -q: Quality cutoff (e.g., 20 for both ends) # -m: Minimum read length after trimming (e.g., 20 bp) # -o: Output file for Read 1 # -p: Output file for Read 2 cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \ -q 20,20 -m 20 \ -o "${OUTPUT_READ1}" -p "${OUTPUT_READ2}" \ "${INPUT_READ1}" "${INPUT_READ2}" -
4
Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
$ Bash example
# Install cutadapt if not already available # conda install -c bioconda cutadapt # Placeholder for input FASTQ file (e.g., raw reads after demultiplexing) INPUT_FASTQ="input_reads.fastq.gz" # Placeholder for output trimmed FASTQ file OUTPUT_TRIMMED_FASTQ="trimmed_reads.fastq.gz" # Adapter sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline # These files would contain the specific adapter sequences to be trimmed. # For example, g_adapters.fasta might contain 5' adapters, A_adapters.fasta and a_adapters.fasta 3' adapters. # Ensure these files are present in the working directory or provide their full paths. G_ADAPTERS_FASTA="g_adapters.fasta" A_ADAPTERS_FASTA="A_adapters.fasta" a_ADAPTERS_FASTA="a_adapters.fasta" # Execute cutadapt with the specified arguments for adapter trimming and quality filtering cutadapt \ --match-read-wildcards \ -O 1 \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -g "${G_ADAPTERS_FASTA}" \ -A "${A_ADAPTERS_FASTA}" \ -a "${a_ADAPTERS_FASTA}" \ -o "${OUTPUT_TRIMMED_FASTQ}" \ "${INPUT_FASTQ}" -
5
Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events.
$ Bash example
# Install cutadapt (version 1.9.1) # conda create -n cutadapt_env cutadapt=1.9.1 # conda activate cutadapt_env # Define input and output files INPUT_FASTQ="input_reads.fastq.gz" OUTPUT_FASTQ="trimmed_reads.fastq.gz" # Define the 3' adapter sequence commonly used in eCLIP assays. # This adapter helps remove double-ligation events where the adapter ligates to itself or to the 3' end of the read. ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Trim reads using cutadapt to remove the 3' adapter and filter for minimum length. # -a: Specifies a 3' adapter sequence. # -m: Discard reads shorter than MIN_LENGTH. cutadapt -a "${ADAPTER_SEQUENCE}" \ -m 15 \ -o "${OUTPUT_FASTQ}" \ "${INPUT_FASTQ}" -
6
Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline)
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # Placeholder for input FASTQ file (e.g., raw reads from sequencing) INPUT_FASTQ="input.fastq.gz" # Placeholder for output trimmed FASTQ file OUTPUT_TRIMMED_FASTQ="output_trimmed.fastq.gz" # Adapter file generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline ADAPTER_FILE="A_adapters.fasta" cutadapt \ --match-read-wildcards \ -O 1 \ --times 1 \ -e 0.1 \ --quality-cutoff 6 \ -m 18 \ -A "${ADAPTER_FILE}" \ -o "${OUTPUT_TRIMMED_FASTQ}" \ "${INPUT_FASTQ}" -
7
Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables for input and reference files TRIMMED_READS="trimmed_reads.fastq.gz" # Replace with your actual trimmed reads file MOUSE_REPBASE_STAR_INDEX="path/to/mouse_repbase_18.05_star_index" # Replace with the actual path to the STAR index built from RepBase 18.05 mouse sequences OUTPUT_PREFIX="star_repbase_mapping_" NUM_THREADS=8 # Adjust as needed # Run STAR to map trimmed reads against the mouse-specific repeat element database STAR \ --genomeDir "${MOUSE_REPBASE_STAR_INDEX}" \ --readFilesIn "${TRIMMED_READS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 100 \ --runThreadN "${NUM_THREADS}" -
8
Args: --runThreadN 16 \ --genomeDir mouse_repbase \ --readFilesIn path/to/read1 path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
$ Bash example
# Install STAR using conda # conda create -n star_env star=2.7.10a -c bioconda -c conda-forge # conda activate star_env # Placeholder for STAR genome index directory # Replace /path/to/STAR_index/GRCm39_with_repbase with the actual path to your STAR index. # This index should be generated for the mouse genome (e.g., GRCm39) and potentially include repetitive elements. # Example command for genome generation: # STAR --runThreadN <num_threads> --runMode genomeGenerate --genomeDir /path/to/STAR_index/GRCm39_with_repbase --genomeFastaFiles /path/to/GRCm39.fasta --sjdbGTFfile /path/to/GRCm39.gtf --sjdbOverhang 100 GENOME_DIR="/path/to/STAR_index/GRCm39_with_repbase" # Placeholder for input FASTQ files READ1="path/to/read1.fastq.gz" READ2="path/to/read2.fastq.gz" # Placeholder for output prefix OUTPUT_PREFIX="out_prefix" STAR \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" "${READ2}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
9
Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.0i # Define variables # Input FASTQ file containing unmapped reads filtered of repeat elements INPUT_FASTQ="unmapped_reads.fastq.gz" # Path to the pre-built STAR genome index for the mouse genome (mm10). # This directory should contain files like Genome, SA, SAindex, etc. # Replace with the actual path to your mm10 STAR index. GENOME_INDEX_DIR="/path/to/mm10_star_index" # Prefix for output files (e.g., mapped_reads_Aligned.sortedByCoord.out.bam) OUTPUT_PREFIX="mapped_reads_" # Number of threads to use for alignment NUM_THREADS=8 # Adjust based on available resources # Run STAR alignment STAR --genomeDir "${GENOME_INDEX_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard -
10
Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 /path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd
STAR (Inferred with models/gemini-2.5-flash) v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
STAR_GENOME_DIR="/path/to/STAR_genome_index_GRCh38" READ1_FILE="/path/to/read1.fastq.gz" READ2_FILE="/path/to/read2.fastq.gz" OUTPUT_PREFIX="aligned_output" # Install STAR using conda # conda install -c bioconda star # Note: The genome index must be pre-built using STAR's genomeGenerate command. # For human (GRCh38), you might download pre-built indices or create your own. STAR \ --runThreadN 16 \ --genomeDir "${STAR_GENOME_DIR}" \ --readFilesIn "${READ1_FILE}" "${READ2_FILE}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd -
11
Aligned reads were sorted with samtools (1.4.1)
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools=1.4.1 # Sort aligned reads # Replace 'aligned_reads.bam' with your actual input BAM file # Replace 'sorted_aligned_reads.bam' with your desired output sorted BAM file samtools sort -o sorted_aligned_reads.bam aligned_reads.bam
-
12
Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines.
$ Bash example
# Install eclip-utils, which contains barcodecollapsepe.py # conda install -c bioconda eclip-utils # or # pip install eclip-utils # Define input and output files INPUT_R1="sorted_reads_R1.fastq.gz" INPUT_R2="sorted_reads_R2.fastq.gz" OUTPUT_PREFIX="collapsed_reads" # Collapse sorted reads using barcodecollapsepe.py barcodecollapsepe.py -1 "${INPUT_R1}" -2 "${INPUT_R2}" -o "${OUTPUT_PREFIX}" -
13
Args: -o preRmDup.bam -m metrics_file -b rmDup.bam
$ Bash example
# Install sambamba # conda install -c bioconda sambamba # Execute sambamba markdup sambamba markdup -o preRmDup.bam -m markdup_metrics.txt rmDup.bam
-
14
PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam)
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Merge PCR de-duped reads from each inline barcode # Replace input_deduped_barcode_1.bam, input_deduped_barcode_2.bam with your actual de-duped BAM files samtools merge merged.bam input_deduped_barcode_1.bam input_deduped_barcode_2.bam
-
15
Merged alignments were split to keep just read2 using samtools (1.4.1) view.
$ Bash example
# Input: merged_alignments.bam (placeholder for merged alignment file) # Output: read2_alignments.bam (placeholder for output file containing only read2) # Split merged alignments to keep just read2 # -b: Output BAM format # -f 0x80: Select alignments where the 'read is second in pair' flag (0x80) is set. samtools view -b -f 0x80 merged_alignments.bam > read2_alignments.bam
-
16
Args: -h -b -f 128
Unknown (Inferred with models/gemini-2.5-flash) vUnknown$ Bash example
# The specific tool is not provided in the description. # This command uses a placeholder for the tool based on the given arguments. tool_placeholder -h -b -f 128
-
17
Read2 BAM files were used to identify peak clusters with Clipper (1.2.2).
$ Bash example
# Install CLIPper (if not already installed) # pip install clipper # Or using conda: # conda install -c bioconda clipper # Placeholder for input Read2 BAM file and output prefix INPUT_BAM="input_read2.bam" OUTPUT_PREFIX="output_peaks" # Placeholder for genome reference files (using hg38 as the latest common assembly) # Ensure these files are available in your environment GENOME_FASTA="hg38.fa" CHROM_SIZES="hg38.chrom.sizes" # Run CLIPper to identify peak clusters clipper.py -b ${INPUT_BAM} -o ${OUTPUT_PREFIX} -s ${CHROM_SIZES} -f ${GENOME_FASTA} -
18
Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam
(Inferred with models/gemini-2.5-flash) vNot specified$ Bash example
# This tool is inferred based on the arguments provided. # The '--save-pickle' argument strongly suggests it's a Python script. # Replace 'python_script.py' with the actual script/tool name if known. # Example installation (replace with actual if known): # pip install some_python_package # or # git clone https://github.com/user/repo.git # cd repo # python setup.py install # Reference genome: mm10 (Mouse, GRCm38 assembly) # Source: UCSC Genome Browser or Ensembl python_script.py --species mm10 \ --bam path/to/input.bam \ --timeout 3600000 \ --maxgenes 1000000 \ --save-pickle \ --outfile path/to/output.bam -
19
Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+.
$ Bash example
# Install eCLIP (or ensure scripts are in PATH) # For example, clone the repository and add scripts to PATH: # git clone https://github.com/yeolab/eclip.git # export PATH="$(pwd)/eclip/scripts:$PATH" # Placeholder variables for input files # Replace with actual paths to your IP and INPUT read2 BAM files and the peak file to be normalized IP_READ2_BAM="ip_sample_read2.bam" INPUT_READ2_BAM="input_sample_read2.bam" PEAK_FILE="initial_peak_clusters.bed" OUTPUT_PREFIX="normalized_peak_clusters" # Normalize peak clusters using peaksnormalize.pl # The script uses read2 BAM files for IP against read2 BAM files for INPUT. peaksnormalize.pl \ --ip_bam "${IP_READ2_BAM}" \ --input_bam "${INPUT_READ2_BAM}" \ --peak_file "${PEAK_FILE}" \ --output_prefix "${OUTPUT_PREFIX}" -
20
Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+
$ Bash example
# Clone the eCLIP pipeline repository to get the script # git clone https://github.com/yeolab/eclip.git # cd eclip # Placeholder for input normalized peak regions from replicates # These would typically be BED files generated by a previous peak calling step INPUT_PEAKS_REP1="replicate1_normalized_peaks.bed" INPUT_PEAKS_REP2="replicate2_normalized_peaks.bed" # Placeholder for the output merged peak file OUTPUT_MERGED_PEAKS="merged_replicate_overlapping_peaks.bed" # Execute the script to merge overlapping normalized peak regions # The script takes multiple input BED files and outputs a single merged BED file perl scripts/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \ "${INPUT_PEAKS_REP1}" \ "${INPUT_PEAKS_REP2}" \ > "${OUTPUT_MERGED_PEAKS}" -
21
Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks.
$ Bash example
# Install IDR (if not already installed) # conda install -c bioconda idr=2.0.2 # Placeholder for input files, which are "Normalized peak files ranked by entropy score" # These files would typically be generated by 'make_informationcontent_from_peaks.pl' # from the merge_peaks pipeline, for example: # perl make_informationcontent_from_peaks.pl -i replicate1.peaks.bed -o replicate1.ranked.bed # perl make_informationcontent_from_peaks.pl -i replicate2.peaks.bed -o replicate2.ranked.bed # Define input ranked peak files and output prefix INPUT_PEAKS_REP1="replicate1.ranked.bed" INPUT_PEAKS_REP2="replicate2.ranked.bed" OUTPUT_PREFIX="sample_id" # Run IDR to determine reproducible peaks between replicates. # Parameters are based on the 'run_idr.sh' script within the yeolab/merge_peaks pipeline. idr --plot \ --log-output-file "${OUTPUT_PREFIX}.idr.log" \ --output-file "${OUTPUT_PREFIX}.idr.peaks" \ "${INPUT_PEAKS_REP1}" "${INPUT_PEAKS_REP2}" -
22
Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations
$ Bash example
# Install bedtools # conda install -c bioconda bedtools # Define input and output files REPRODUCIBLE_PEAKS="reproducible_peaks.bed" # Placeholder for your reproducible peak regions (e.g., output from IDR) GENCODE_GTF_URL="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M10/gencode.vM10.annotation.gtf.gz" GENCODE_GENES_BED="gencode.vM10.genes.bed" # Placeholder for Gencode M10 gene annotations in BED format ANNOTATED_PEAKS="annotated_peaks.bed" # Download Gencode M10 annotation GTF (if not already present) # wget -nc "${GENCODE_GTF_URL}" # Extract gene features from GTF and convert to BED format. # This step assumes you want to annotate peaks with gene regions. # A more robust GTF to BED conversion might be needed depending on specific annotation requirements (e.g., exons, transcripts). # Example using awk to get gene regions (chr, start, end, gene_id|gene_name, score, strand): # zcat gencode.vM10.annotation.gtf.gz | awk '$3 == "gene" { # split($9, a, ";"); # gene_id=""; gene_name=""; # for (i=1; i<=length(a); i++) { # if (a[i] ~ /gene_id/) { gene_id = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); } # if (a[i] ~ /gene_name/) { gene_name = substr(a[i], index(a[i], "\"")+1, length(a[i])-index(a[i], "\"")-1); } # } # print $1"\t"$4-1"\t"$5"\t"gene_id"|"gene_name"\t0\t"$7 # }' > "${GENCODE_GENES_BED}" # Overlap reproducible peaks with Gencode M10 gene annotations # -a: input reproducible peaks BED file # -b: Gencode gene annotations BED file # -loj: left outer join, reports all entries in A, with corresponding entries in B if an overlap is found. # If no overlap, B fields are reported as NULL. This is common for annotating peaks. bedtools intersect -a "${REPRODUCIBLE_PEAKS}" -b "${GENCODE_GENES_BED}" -loj > "${ANNOTATED_PEAKS}"
Raw Source Text
Sequenced reads were removed of inline barcodes and reformatted to include randomers in read headers with eclipdemux (v0.0.1). Args: --length 10 Reads were then trimmed with cutadapt (1.9.1). Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -g g_adapters.fasta -A A_adapters.fasta -a a_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline) Reads were then trimmed once more with cutadapt (1.9.1) to remove double-ligation events. Args: --match-read-wildcards -O 1 --times 1 -e 0.1 --quality-cutoff 6 -m 18 -A A_adapters.fasta (fasta sequences generated from parsebarcodes.sh within the eclip 0.1.5+ pipeline) Trimmed reads were then mapped with STAR (2.4.0i) against a mouse-specific repeat element database (RepBase 18.05). Args: --runThreadN 16 \ --genomeDir mouse_repbase \ --readFilesIn path/to/read1 path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Unmapped reads filtered of repeat elements were then mapped with STAR (2.4.0i) against a mouse genome (mm10). Args: --runThreadN 16 \ --genomeDir genomedir \ --readFilesIn /path/to/read1 /path/to/read2 \ --outFileNamePrefix out_prefix \ --outReadsUnmapped Fastx \ --outSAMtype BAM Unsorted \ --outSAMattributes All \ --outSAMunmapped Within \ --outSAMattrRGline ID:foo \ --outFilterType BySJout \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFilterScoreMin 10 \ --alignEndsType EndToEnd Aligned reads were sorted with samtools (1.4.1) Sorted reads were collapsed with barcodecollapsepe.py included in eclip 0.1.5+ pipelines. Args: -o preRmDup.bam -m metrics_file -b rmDup.bam PCR de-duped reads from each inline barcode were then merged with samtools (1.4.1) merge (merged.bam) Merged alignments were split to keep just read2 using samtools (1.4.1) view. Args: -h -b -f 128 Read2 BAM files were used to identify peak clusters with Clipper (1.2.2). Args: --species mm10 --bam path/to/input.bam --timeout 3600000 --maxgenes 1000000 --save-pickle --outfile path/to/output.bam Peak clusters were normalized using read2 BAM files for IP against read2 BAM files for INPUT with peaksnormalize.pl (overlap_peakfi_with_bam_PE.pl), included in eclip 0.1.5+. Overlapping normalized peak regions were merged with compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl, included within eclip-0.1.5+ Normalized peak files were ranked by entropy score (make_informationcontent_from_peaks.pl included within the merge_peaks pipeline) and used as inputs to IDR (2.0.2) to determine reproducible peaks. Reproducible peaks were annotated by overlapping peak regions with Gencode M10 annotations Genome_build: mm10 Supplementary_files_format_and_content: tab-delimited text files include -log10pValues and Log2FoldChange values for each IDR peaks called after cutoff of 3,3 in each parameter