GSE80039 Processing Pipeline
Publication
Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).Nature methods (2016) — PMID 27018577
Dataset
GSE80039Enhanced CLIP (eCLIP) enables robust and scalable transcriptome-wide discovery and characterization of RNA binding protein binding sites [eCLIP - Hep…
Processing Steps
Generate Jupyter Notebook-
1
Library strategy: eCLIP-seq
$ Bash example
# This command is a placeholder for running the eCLIP CWL workflow. # It assumes 'eclip.cwl' is the main workflow definition file # and 'eclip_inputs.yaml' contains paths to input FASTQ files, # genome reference (e.g., hg38), and other necessary parameters. # # Example 'eclip_inputs.yaml' content for a human (hg38) sample: # fastq_r1: { class: File, path: "sample_R1.fastq.gz" } # fastq_r2: { class: File, path: "sample_R2.fastq.gz" } # genome_fasta: { class: File, path: "/path/to/hg38.fa" } # genome_star_index: { class: Directory, path: "/path/to/hg38_star_index" } # adapter_sequence: "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example adapter # output_dir: "eclip_results" # # For detailed setup and execution, refer to the yeolab/eclip GitHub repository: # https://github.com/yeolab/eclip/ # # Installation of cwltool (if not already installed): # conda install -c conda-forge cwltool # or # pip install cwltool # # Clone the eCLIP CWL workflow repository: # git clone https://github.com/yeolab/eclip.git # cd eclip # # Execute the eCLIP CWL workflow: # Replace 'eclip.cwl' and 'eclip_inputs.yaml' with actual paths. cwltool eclip.cwl eclip_inputs.yaml -
2
Takes output from raw files.
$ Bash example
# Install Trim Galore! (if not already installed) # conda install -c bioconda trim-galore # Define input raw FASTQ files (replace with actual file paths) # Assuming paired-end raw FASTQ files as common input for many pipelines INPUT_FASTQ_R1="sample_R1.fastq.gz" INPUT_FASTQ_R2="sample_R2.fastq.gz" # Define output directory for trimmed FASTQ files OUTPUT_DIR="./trimmed_fastq" mkdir -p "${OUTPUT_DIR}" # Run Trim Galore! for adapter trimming and quality filtering # This command processes paired-end reads, automatically detects adapters, # and places the trimmed files in the specified output directory. # Trim Galore! internally uses Cutadapt for trimming. trim_galore --paired \ --output_dir "${OUTPUT_DIR}" \ "${INPUT_FASTQ_R1}" \ "${INPUT_FASTQ_R2}" -
3
Run to trim off both 5â and 3â adapters on both reads.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=4.0 # Define input and output file paths READ1_IN="read1.fastq.gz" READ2_IN="read2.fastq.gz" READ1_OUT="trimmed_read1.fastq.gz" READ2_OUT="trimmed_read2.fastq.gz" REPORT_FILE="cutadapt_report.txt" # Define adapter sequences (example Illumina TruSeq adapters from Yeo lab's skipper workflow) # IMPORTANT: Replace these with the actual adapter sequences used in your library preparation. # If distinct 5' adapters are used, replace ADAPTER_FWD_5PRIME and ADAPTER_REV_5PRIME accordingly. ADAPTER_FWD_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" ADAPTER_REV_3PRIME="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # For 5' adapter trimming, often the same sequence or a specific 5' adapter is used. # Using the same sequence as a placeholder if no distinct 5' adapter is specified. ADAPTER_FWD_5PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Placeholder, replace if distinct 5' adapter exists ADAPTER_REV_5PRIME="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Placeholder, replace if distinct 5' adapter exists # Run cutadapt to trim both 5' and 3' adapters from both reads # -a ADAPTER_FWD_3PRIME: 3' adapter for R1 # -A ADAPTER_REV_3PRIME: 3' adapter for R2 # -g ADAPTER_FWD_5PRIME: 5' adapter for R1 # -G ADAPTER_REV_5PRIME: 5' adapter for R2 # -q 20: Trim low-quality bases from the 3' end (Phred score < 20) # --minimum-length 15: Discard reads shorter than 15 bp after trimming # -e 0.1: Maximum error rate for adapter matching # -o: Output file for R1 # -p: Output file for R2 cutadapt \ -a "${ADAPTER_FWD_3PRIME}" \ -A "${ADAPTER_REV_3PRIME}" \ -g "${ADAPTER_FWD_5PRIME}" \ -G "${ADAPTER_REV_5PRIME}" \ -q 20 \ --minimum-length 15 \ -e 0.1 \ -o "${READ1_OUT}" \ -p "${READ2_OUT}" \ "${READ1_IN}" "${READ2_IN}" > "${REPORT_FILE}" 2>&1 -
4
Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
$ Bash example
# Install dependencies: Cutadapt # conda install -c bioconda cutadapt # Install quality-cutoff script: # This script (quality-cutoff.py) is part of the yeolab/eclip workflow. # git clone https://github.com/yeolab/eclip.git # cd eclip/scripts # chmod +x quality-cutoff.py # # Ensure 'quality-cutoff' is in your PATH, e.g., by creating a symlink or adding the directory to PATH: # # sudo ln -s $(pwd)/quality-cutoff.py /usr/local/bin/quality-cutoff # # Alternatively, invoke directly using python: python /path/to/eclip/scripts/quality-cutoff.py ... # Execute the quality-cutoff command quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
-
5
Takes output from cutadapt round 1.
$ Bash example
# Install cutadapt if not already available # conda install -c bioconda cutadapt=1.18 # Execute cutadapt for a second round of trimming. # This command typically focuses on quality trimming, length filtering, # removing reads with Ns, and potentially trimming poly-A tails, # assuming primary adapter trimming was handled in the first round. cutadapt \ -q 20,20 \ -m 18 \ --max-n 0 \ -a "A{10}" \ -o output_cutadapt_round2.fastq.gz \ input_from_cutadapt_round1.fastq.gz -
6
Run to trim off the 3â adapters on read 2, to control for double ligation events.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # Define input and output files READ1_INPUT="sample_R1.fastq.gz" READ2_INPUT="sample_R2.fastq.gz" READ1_TRIMMED="sample_R1_trimmed.fastq.gz" READ2_TRIMMED="sample_R2_trimmed.fastq.gz" ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Standard Illumina TruSeq adapter for eCLIP # Run cutadapt to trim 3' adapters from Read 2, outputting both R1 and R2 # -a ADAPTER: Specifies the 3' adapter to trim from the forward read (R2 in this case) # -o: Output file for the forward read (R2) # -p: Output file for the reverse read (R1), which is paired with the forward read # -m 18: Discard reads shorter than 18 bp after trimming, as used in the eCLIP pipeline cutadapt -a "${ADAPTER_SEQUENCE}" \ -o "${READ2_TRIMMED}" \ -p "${READ1_TRIMMED}" \ -m 18 \ "${READ2_INPUT}" \ "${READ1_INPUT}" -
7
Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
$ Bash example
# Install cutadapt (e.g., using conda) # conda install -c bioconda cutadapt # Define input and output files INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz" INPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz" OUTPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz" OUTPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz" METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics" # Define adapters ADAPTERS=( "-A AACTTGTAGATCGGA" "-A AGGACCAAGATCGGA" "-A ACTTGTAGATCGGAA" "-A GGACCAAGATCGGAA" "-A CTTGTAGATCGGAAG" "-A GACCAAGATCGGAAG" "-A TTGTAGATCGGAAGA" "-A ACCAAGATCGGAAGA" "-A TGTAGATCGGAAGAG" "-A CCAAGATCGGAAGAG" "-A GTAGATCGGAAGAGC" "-A CAAGATCGGAAGAGC" "-A TAGATCGGAAGAGCG" "-A AAGATCGGAAGAGCG" "-A AGATCGGAAGAGCGT" "-A GATCGGAAGAGCGTC" "-A ATCGGAAGAGCGTCG" "-A TCGGAAGAGCGTCGT" "-A CGGAAGAGCGTCGTG" "-A GGAAGAGCGTCGTGT" ) # Execute cutadapt command cutadapt -f fastq \ --match-read-wildcards \ --times 1 \ -e 0.1 \ -O 5 \ --quality-cutoff 6 \ -m 18 \ "${ADAPTERS[@]}" \ -o "${OUTPUT_R1}" \ -p "${OUTPUT_R2}" \ "${INPUT_R1}" \ "${INPUT_R2}" \ > "${METRICS_FILE}" -
8
Takes output from cutadapt round 2.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=1.18 # Define input and output files # INPUT_FASTQ is the output from cutadapt round 1 (adapter trimming) INPUT_FASTQ="sample_R1_trimmed_adapter.fastq.gz" OUTPUT_FASTQ="sample_R1_trimmed_polyA.fastq.gz" # Run cutadapt for poly-A trimming (round 2 in eCLIP pipeline) # -a A{100}: Trims a poly-A tail of up to 100 A's # -q 10: Trims low-quality bases from the 3' end with a quality cutoff of 10 # --minimum-length 18: Discards reads shorter than 18 bp after trimming # -e 0.1: Maximum error rate of 10% for adapter matching # --overlap 3: Minimum overlap of 3 bases for adapter matching # -j 8: Use 8 CPU cores for parallel processing cutadapt -a A{100} \ -q 10 \ --minimum-length 18 \ -e 0.1 \ --overlap 3 \ -j 8 \ -o "${OUTPUT_FASTQ}" \ "${INPUT_FASTQ}" -
9
Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.
$ Bash example
# Install BBTools suite (which includes BBDuk) # conda install -c bioconda bbmap # Define variables # Replace with your actual input FASTQ file(s) INPUT_READS="input_reads.fastq.gz" OUTPUT_FILTERED_READS="filtered_reads.fastq.gz" # This FASTA file contains sequences of human-specific repetitive elements from RepBase. # It needs to be prepared beforehand, e.g., by extracting sequences from the RepBase database # (Genetic Information Research Institute - GIRI) or by extracting repeat sequences # identified by RepeatMasker on the human reference genome (e.g., GRCh38). # For example, a combined FASTA of human rRNA, tRNA, and other RepBase elements. HUMAN_REPBASE_FASTA="path/to/human_repbase_elements.fa" # Run BBDuk to remove reads that map to human-specific repetitive elements. # BBDuk maps reads against the provided reference FASTA and filters out matches. # in: Input FASTQ file(s). Can be comma-separated for multiple files or wildcards. # ref: Reference FASTA file containing repetitive element sequences. # out: Output FASTQ file(s) with repetitive reads removed. # k: Kmer length for matching (default 31, can be adjusted for sensitivity). # hdist: Hamming distance for kmer matching (default 1, allows for 1 mismatch). # stats: Output statistics about filtered reads to a specified file. # overwrite: Allow overwriting output files if they exist. # The description mentions "rRNA (& other) repetitive reads". If the HUMAN_REPBASE_FASTA # includes rRNA sequences, then this single step can handle both rRNA and other RepBase elements. bbduk.sh in="${INPUT_READS}" \ ref="${HUMAN_REPBASE_FASTA}" \ out="${OUTPUT_FILTERED_READS}" \ k=31 \ hdist=1 \ stats="${OUTPUT_FILTERED_READS}.stats" \ overwrite=true -
10
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
$ Bash example
bash # Reference genome directory for RepBase human repeats. # This is a placeholder. Replace with the actual path to your STAR-indexed RepBase human genome directory. GENOME_DIR="/path/to/RepBase_human_database_file" # Input FASTQ files READ1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz" READ2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz" # Output BAM file prefix (for auxiliary files like Log.out, SJ.out.tab) and the final redirected BAM file. # Note: The main alignment output is sent to stdout (--outStd BAM_Unsorted) and then redirected to FINAL_BAM_OUTPUT. OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam" FINAL_BAM_OUTPUT="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam" STAR --runMode alignReads \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --genomeLoad LoadAndRemove \ --readFilesIn "${READ1}" "${READ2}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMattributes All \ --readFilesCommand zcat \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd \ > "${FINAL_BAM_OUTPUT}" -
11
Takes output from STAR rmRep.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Install samtools (if not already installed) # conda install -c bioconda samtools # Define variables (replace with actual paths and filenames) # GENOME_DIR: Path to the STAR genome index (e.g., for hg38). # READ1: Path to the input FASTQ file for read 1. # READ2: Path to the input FASTQ file for read 2 (optional, remove if single-end). # OUTPUT_PREFIX: Prefix for output files. # THREADS: Number of threads to use for STAR alignment. GENOME_DIR="/path/to/STAR_genome_index/hg38" READ1="input_read1.fastq.gz" READ2="input_read2.fastq.gz" # Remove this line if single-end reads OUTPUT_PREFIX="aligned_reads" THREADS=8 # 1. Align reads with STAR # This command aligns RNA-based assay reads (like eCLIP) to a reference genome. # Parameters are adapted from the Yeo lab eCLIP CWL workflow (https://github.com/yeolab/eclip). # --runThreadN: Number of threads. # --genomeDir: Path to the STAR genome index. # --readFilesIn: Input FASTQ files. Use only READ1 if single-end. # --outFileNamePrefix: Prefix for output files. # --outSAMtype BAM SortedByCoordinate: Output sorted BAM file. # --outFilterMultimapNmax 1: Consider only uniquely mapping reads (common for eCLIP). # --outFilterMismatchNmax 3: Max number of mismatches per read. # --alignIntronMax 1: For eCLIP, introns are not expected, so set to 1 to disable splicing. # --alignEndsType Local: Local alignment for eCLIP. # --outFilterScoreMinOverLread 0.66 --outFilterMatchNminOverLread 0.66: Filtering parameters. # --outFilterMatchNmin 20: Minimum number of matched bases. # --limitBAMsortRAM 30000000000: Limit RAM for BAM sorting (30GB). STAR \ --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1} ${READ2} \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --alignIntronMax 1 \ --alignEndsType Local \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --outFilterMatchNmin 20 \ --limitBAMsortRAM 30000000000 # The above command produces a sorted BAM file: ${OUTPUT_PREFIX}_Aligned.sortedByCoordinate.out.bam # 2. Deduplicate reads using samtools markdup (implied by "rmRep" - remove replicates) # This step removes PCR duplicates from the aligned BAM file, which is crucial for eCLIP. # -r: Remove duplicate reads (rather than just marking them). # -S: Treat all reads as single-end (used in eCLIP pipelines even for paired-end input if pairing is not strictly maintained). samtools markdup -r -S \ ${OUTPUT_PREFIX}_Aligned.sortedByCoordinate.out.bam \ ${OUTPUT_PREFIX}_deduplicated.bam # Index the deduplicated BAM file for downstream processing samtools index ${OUTPUT_PREFIX}_deduplicated.bam -
12
Maps unique reads to the human genome.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.7.10a # --- Reference Data Setup (Example using GRCh38 and GENCODE v38) --- # Download human genome primary assembly FASTA (e.g., from UCSC or NCBI) # wget -P /path/to/references/ https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz # gunzip /path/to/references/hg38.fa.gz # Download GENCODE v38 GTF annotation (e.g., from GENCODE) # wget -P /path/to/references/ https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.annotation.gtf.gz # gunzip /path/to/references/gencode.v38.annotation.gtf.gz # Create STAR genome index (run once per reference genome) # mkdir -p /path/to/STAR_index/GRCh38_gencode_v38 # STAR --runThreadN 8 \ # --runMode genomeGenerate \ # --genomeDir /path/to/STAR_index/GRCh38_gencode_v38 \ # --genomeFastaFiles /path/to/references/hg38.fa \ # --sjdbGTFfile /path/to/references/gencode.v38.annotation.gtf \ # --sjdbOverhang 100 # Recommended for RNA-seq, typically read length - 1 # --- Alignment Command --- # Maps unique reads to the human genome (GRCh38) using STAR # Input: input_reads.fastq.gz (replace with your actual FASTQ file) # Output: output_prefix_Aligned.sortedByCoord.out.bam (BAM file sorted by coordinate) # output_prefix_ReadsPerGene.out.tab (Gene counts, if --quantMode GeneCounts is used) STAR --runThreadN 8 \ --genomeDir /path/to/STAR_index/GRCh38_gencode_v38 \ --readFilesIn input_reads.fastq.gz \ --outFileNamePrefix output_prefix_ \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 10 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --quantMode GeneCounts
-
13
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
$ Bash example
bash # Install STAR (if not already installed) # conda install -c bioconda star # Define variables # Replace with your actual STAR genome directory (e.g., for hg38 or mm10). # For eCLIP/RNA-based assays, hg38 is a common reference. GENOME_DIR="/path/to/your/STAR_index/hg38" # Input FASTQ files (these appear to be unmapped mates extracted from a BAM file) INPUT_READS_MATE1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1" INPUT_READS_MATE2="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2" # Output BAM file (the alignment output is redirected to this file from stdout) OUTPUT_BAM_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" # Prefix for other STAR output files (e.g., Log.out, SJ.out.tab, etc.) # Note: The original description uses the .bam file name as a prefix, which will result in files like "your.bamLog.out". # If you prefer a cleaner prefix, you might change this to something like "/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep." OUTPUT_FILE_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" # Run STAR alignment STAR \ --runMode alignReads \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --genomeLoad LoadAndRemove \ --readFilesIn "${INPUT_READS_MATE1}" "${INPUT_READS_MATE2}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_FILE_PREFIX}" \ --outSAMattributes All \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd \ > "${OUTPUT_BAM_FILE}" -
14
takes output from STAR genome mapping.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables # Replace with actual paths and filenames GENOME_DIR="/path/to/STAR_genome_index/GRCh38" # Placeholder for GRCh38 genome index READ1_FASTQ="input_R1.fastq.gz" READ2_FASTQ="input_R2.fastq.gz" # Omit if single-end OUTPUT_PREFIX="mapped_reads" THREADS=8 # Number of threads to use # Run STAR mapping STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outSAMunmapped Within \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # ~30GB RAM for sorting, adjust as needed based on available memory -
15
Custom random-mer-aware script for PCR duplicate removal.
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools=1.1.2 # Example: Deduplicate a BAM file using UMIs embedded in read IDs. # This command assumes UMIs have been extracted and appended to read IDs # in a previous step (e.g., using 'umi_tools extract') and are separated by an underscore '_'. # The 'directional' method is commonly used for eCLIP data to handle PCR duplicates. umi_tools dedup \ --input input.sorted.bam \ --output output.dedup.bam \ --extract-umi-method=read_id \ --umi-separator='_' \ --method=directional \ --log dedup.log -
16
Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
$ Bash example
bash # Clone the eCLIP pipeline repository # git clone https://github.com/yeolab/eclip.git # cd eclip # Create and activate the conda environment (if using conda) from the provided environment.yml # conda env create -f environment.yml # conda activate eclip # Set the path to the eCLIP scripts directory # Adjust this path to where you cloned the 'eclip' repository ECLIP_SCRIPTS_DIR="/path/to/cloned/eclip/scripts" # Define input and output file paths INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam" METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics" # Execute the barcode_collapse_pe.py script python "${ECLIP_SCRIPTS_DIR}/barcode_collapse_pe.py" \ --bam "${INPUT_BAM}" \ --out_file "${OUTPUT_BAM}" \ --metrics_file "${METRICS_FILE}" -
17
Takes output from barcode collapse PE.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # Define input files (expected output from a barcode collapse PE step) # These are placeholder names; adjust to actual file names from the previous step. INPUT_R1="collapsed_reads_r1.fastq.gz" INPUT_R2="collapsed_reads_r2.fastq.gz" # Define output files for trimmed reads OUTPUT_R1="trimmed_R1.fastq.gz" OUTPUT_R2="trimmed_R2.fastq.gz" # Define adapter sequences commonly used in eCLIP (Illumina TruSeq adapters) # These specific adapters are used in the Yeo lab eCLIP pipeline. ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Execute cutadapt for paired-end adapter trimming # -a: 3' adapter for R1 # -A: 3' adapter for R2 # -o: Output file for R1 # -p: Output file for R2 # -m 18: Discard reads shorter than 18 bp after trimming (as used in eCLIP pipeline) cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \ -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \ -m 18 \ "${INPUT_R1}" "${INPUT_R2}" -
18
Sorts resulting bam file for use downstream.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Define input and output file names INPUT_BAM="input.bam" OUTPUT_SORTED_BAM="${INPUT_BAM%.bam}.sorted.bam" # Sort the BAM file by coordinate # -o: output file # -@: number of threads (adjust as needed) # -m: memory per thread (adjust as needed, e.g., 2G for 2GB) samtools sort -o "${OUTPUT_SORTED_BAM}" -@ 8 -m 2G "${INPUT_BAM}" -
19
Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true
$ Bash example
# Install Picard (example using conda) # conda install -c bioconda picard # Define variables for paths and files GATK_QUEUE_JAR="/path/to/gatk/dist/Queue.jar" # Adjust this path to your GATK Queue.jar DATA_DIR="/full/path/to/files" # Base directory for input/output files INPUT_BAM="$DATA_DIR/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam" OUTPUT_BAM="$DATA_DIR/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" TMP_DIR="$DATA_DIR/.queue/tmp" # Create temporary directory if it doesn't exist mkdir -p "$TMP_DIR" # Execute Picard SortSam via GATK Queue.jar java -Xmx2048m \ -XX:+UseParallelOldGC \ -XX:ParallelGCThreads=4 \ -XX:GCTimeLimit=50 \ -XX:GCHeapFreeLimit=10 \ -Djava.io.tmpdir="$TMP_DIR" \ -cp "$GATK_QUEUE_JAR" \ net.sf.picard.sam.SortSam \ INPUT="$INPUT_BAM" \ TMP_DIR="$TMP_DIR" \ OUTPUT="$OUTPUT_BAM" \ VALIDATION_STRINGENCY=SILENT \ SO=coordinate \ CREATE_INDEX=true -
20
Takes output from sortSam, makes bam index for use downstream.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Assuming 'sorted.bam' is the output from sortSam samtools index sorted.bam
-
21
Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # Define input and output paths INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" OUTPUT_BAI="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai" # Execute samtools index command samtools index "$INPUT_BAM" "$OUTPUT_BAI"
-
22
Takes inputs from multiple final bam files.
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # Example: Merging multiple final BAM files into a single BAM file. # This is a common step for combining technical or biological replicates. # Replace 'replicate1.bam', 'replicate2.bam', etc., with your actual input BAM file names. # Replace 'merged_replicates.bam' with your desired output BAM file name. samtools merge -o merged_replicates.bam replicate1.bam replicate2.bam replicate3.bam
-
23
Merges the two technical replicates for further downstream analysis.
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools # Define input and output file names (replace with actual file paths) INPUT_REPLICATE_1="replicate1.bam" INPUT_REPLICATE_2="replicate2.bam" OUTPUT_MERGED_BAM="merged_replicates.bam" # Merge the two technical replicates into a single BAM file samtools merge -o "${OUTPUT_MERGED_BAM}" "${INPUT_REPLICATE_1}" "${INPUT_REPLICATE_2}" -
24
Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Define input and output files OUTPUT_BAM="/full/path/to/files/CombinedID.merged.bam" INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" # Merge sorted BAM files samtools merge "${OUTPUT_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}" -
25
Takes output from sortSam, makes bam index for use downstream.
samtools index (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools=1.19 # Assuming 'sorted.bam' is the output from sortSam samtools index sorted.bam
-
26
Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # Execute samtools index command samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
-
27
Takes output from sortSam.
samtools (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Example: Index a sorted BAM file # This step is commonly performed after sorting a BAM file to enable fast random access to alignments. # Input: sorted.bam (output from sortSam) # Output: sorted.bam.bai (BAM index file) samtools index sorted.bam
-
28
Only outputs the second read in each pair for use with single stranded peak caller.
$ Bash example
# Install samtools (example using conda) # conda install -c bioconda samtools=1.10 # Example: Extract only the second read from each pair in an aligned BAM file # This command filters for reads where the 'second in pair' flag (0x80) is set # and converts them to FASTQ format. # Input: aligned.bam (BAM file containing paired-end reads) # Output: second_reads.fastq (FASTQ file containing only the second reads in each pair) samtools fastq -f 0x80 aligned.bam > second_reads.fastq
-
29
This is the final bam file to perform analysis on.
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools # The description "final bam file to perform analysis on" implies the BAM file is sorted and indexed. # This code block demonstrates how to sort and index a BAM file using samtools. # Replace 'input.bam' with your actual unsorted BAM file and 'final.bam' with your desired output name. samtools sort -o final.bam input.bam samtools index final.bam
-
30
Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Define input and output file paths INPUT_BAM="/full/path/to/files/CombinedID.merged.bam" OUTPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam" # Extract reads that are the second in a pair (flag 128) and output as BAM samtools view -hb -f 128 "${INPUT_BAM}" > "${OUTPUT_BAM}" -
31
Takes results from samtools view.
$ Bash example
# Install samtools (example using conda) # conda install -c bioconda samtools=1.19 # This step takes a BAM file (e.g., 'input.bam') that was previously generated # by a 'samtools view' command (e.g., for format conversion or initial filtering). # It then sorts the BAM file by coordinate, which is a common next step in bioinformatics pipelines. samtools sort -o output.sorted.bam input.bam
-
32
Calls peaks on those files.
clipper (Inferred with models/gemini-2.5-flash) vfrom source (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install clipper (if not already available) # clipper is a Python script, often run directly or installed via pip. # git clone https://github.com/yeolab/clipper.git # cd clipper # pip install . # Define input files and parameters (placeholders) # Replace with actual paths to your IP and control BAM files IP_BAM="path/to/your/ip_replicate1.bam" CONTROL_BAM="path/to/your/control_replicate1.bam" # Optional, but highly recommended for eCLIP OUTPUT_PREFIX="eclip_peaks" SPECIES="hg38" # Placeholder: Use 'hg38' for human, 'mm10' for mouse, etc. FDR_THRESHOLD="0.01" # False Discovery Rate threshold for peak calling WINDOW_SIZE="20" # Window size for peak detection, common for eCLIP # Execute clipper to call peaks # Ensure 'clipper.py' is in your PATH or provide the full path to the script clipper.py --species "${SPECIES}" \ --bam "${IP_BAM}" \ --control-bam "${CONTROL_BAM}" \ --output-prefix "${OUTPUT_PREFIX}" \ --fdr "${FDR_THRESHOLD}" \ --window-size "${WINDOW_SIZE}" -
33
Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle
$ Bash example
# Installation instructions for CLIPper. # It is recommended to use a virtual environment (e.g., conda or venv). # # Example using conda: # conda create -n clipper_env python=3.7 # conda activate clipper_env # pip install clipper # # Alternatively, if installing from source: # git clone https://github.com/yeolab/clipper.git # cd clipper # pip install . # # Reference genome: hg19. Ensure the necessary genome files (e.g., FASTA, gene annotations) # for hg19 are configured or available in the environment where CLIPper is run. clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle
Raw Source Text
Library strategy: eCLIP-seq Takes output from raw files. Run to trim off both 5â and 3â adapters on both reads. Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics Takes output from cutadapt round 1. Run to trim off the 3â adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics Takes output from cutadapt round 2. Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam Takes output from STAR rmRep. Maps unique reads to the human genome. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam takes output from STAR genome mapping. Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics Takes output from barcode collapse PE. Sorts resulting bam file for use downstream. Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai Takes inputs from multiple final bam files. Merges the two technical replicates for further downstream analysis. Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai Takes output from sortSam. Only outputs the second read in each pair for use with single stranded peak caller. This is the final bam file to perform analysis on. Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam Takes results from samtools view. Calls peaks on those files. Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle Genome_build: hg19 Supplementary_files_format_and_content: bigWig, bigBed, bed (col1: chrom, col2: chromStart, col3: chromEnd, col4: -log10 pvalue, col5: log2 fold enrichment above input, col6: strand) format, contains clusters of predicted RBP binding