GSE78507 Processing Pipeline
Publication
Enhanced CLIP Uncovers IMP Protein-RNA Targets in Human Pluripotent Stem Cells Important for Cell Adhesion and Survival.Cell reports (2016) — PMID 27068461
Dataset
GSE78507Enhanced CLIP uncovers IMP protein-RNA targets in human pluripotent stem cells important for cell adhesion and survival [eCLIP-Seq]
Processing Steps
Generate Jupyter Notebook-
1
Library strategy: eCLIP-Seq
$ Bash example
# Install necessary tools (example using conda) # conda create -n eclip_env star clipper python=3.8 -y # conda activate eclip_env # pip install git+https://github.com/yeolab/merge_peaks.git # --- Placeholder for reference genome and annotation --- # Replace with actual paths to your reference files (e.g., hg38) GENOME_FASTA="path/to/hg38.fa" # Inferred: Latest human assembly (hg38) GENOME_GTF="path/to/gencode.v38.annotation.gtf" # Inferred: Corresponding GTF for hg38 STAR_INDEX_DIR="path/to/STAR_genome_index_hg38" # Build STAR genome index (if not already built) # STAR --runMode genomeGenerate \ # --genomeDir "${STAR_INDEX_DIR}" \ # --genomeFastaFiles "${GENOME_FASTA}" \ # --sjdbGTFfile "${GENOME_GTF}" \ # --runThreadN 8 # --- Alignment with STAR (splice-aware aligner) --- # Assuming paired-end reads for eCLIP. Replace with actual input FASTQ files and output directory. INPUT_FASTQ_R1="sample_eCLIP_R1.fastq.gz" INPUT_FASTQ_R2="sample_eCLIP_R2.fastq.gz" CONTROL_FASTQ_R1="sample_control_R1.fastq.gz" CONTROL_FASTQ_R2="sample_control_R2.fastq.gz" OUTPUT_BASE_DIR="eclip_analysis" mkdir -p "${OUTPUT_BASE_DIR}/alignment" echo "Running STAR alignment for eCLIP sample..." STAR --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${INPUT_FASTQ_R1}" "${INPUT_FASTQ_R2}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_BASE_DIR}/alignment/eCLIP_sample_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --runThreadN 8 eCLIP_BAM="${OUTPUT_BASE_DIR}/alignment/eCLIP_sample_Aligned.sortedByCoord.out.bam" echo "Running STAR alignment for control sample..." STAR --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${CONTROL_FASTQ_R1}" "${CONTROL_FASTQ_R2}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_BASE_DIR}/alignment/control_sample_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --runThreadN 8 CONTROL_BAM="${OUTPUT_BASE_DIR}/alignment/control_sample_Aligned.sortedByCoord.out.bam" # --- Peak Calling with CLIPper --- mkdir -p "${OUTPUT_BASE_DIR}/peaks" OUTPUT_PEAKS_BED="${OUTPUT_BASE_DIR}/peaks/eCLIP_sample_peaks.bed" echo "Running CLIPper peak calling..." clipper -b "${eCLIP_BAM}" \ -s "${CONTROL_BAM}" \ -g "${GENOME_FASTA}" \ -o "${OUTPUT_PEAKS_BED}" \ -t 8 # Number of threads # --- IDR (Identifying Reproducible Peaks) with merge_peaks --- # Assuming two replicates for IDR. Replace with actual peak files from replicates. REPLICATE1_PEAKS="path/to/replicate1_peaks.bed" REPLICATE2_PEAKS="path/to/replicate2_peaks.bed" OUTPUT_IDR_PEAKS="${OUTPUT_BASE_DIR}/idr/reproducible_peaks.bed" mkdir -p "${OUTPUT_BASE_DIR}/idr" echo "Running merge_peaks for IDR..." merge_peaks --replicate1 "${REPLICATE1_PEAKS}" \ --replicate2 "${REPLICATE2_PEAKS}" \ --output "${OUTPUT_IDR_PEAKS}" \ --idr_threshold 0.1 # Example IDR threshold -
2
Takes output from raw files.
Unknown (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# The description "Takes output from raw files." is too generic to infer a specific tool or command. # This step likely involves initial processing, quality control, or format conversion depending on the specific assay and raw file type. # Please provide more context (e.g., assay type, file format, desired output) for a more specific inference. # Example placeholder for a common initial step like quality control for sequencing data: # fastqc input_raw_file.fastq -o qc_output_directory # Or for file format conversion: # samtools view -bS input.sam > output.bam
-
3
Run to trim off both 5â and 3â adapters on both reads.
cutadapt (Inferred with models/gemini-2.5-flash) v2.10 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=2.10 # Define input and output file paths READ1_INPUT="input_read1.fastq.gz" READ2_INPUT="input_read2.fastq.gz" READ1_OUTPUT="trimmed_read1.fastq.gz" READ2_OUTPUT="trimmed_read2.fastq.gz" # Define adapter sequences (common Illumina adapters for eCLIP, 3' for Read 1 and Read 2) ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Number of CPU cores to use (adjust as needed) NUM_CORES=$(nproc) # Run cutadapt to trim 3' adapters, perform quality trimming, and filter short reads # -j: Number of CPU cores # -a: 3' adapter for Read 1 # -A: 3' adapter for Read 2 # -q: Quality trim from 5' and 3' ends, minimum quality 20 # --minimum-length: Discard reads shorter than 18 bp after trimming # --nextseq-trim: Quality trim from the 3' end of NextSeq reads (q20) # --pair-filter=any: Discard read pairs if either read becomes too short # -o: Output file for Read 1 # -p: Output file for Read 2 cutadapt \ -j "${NUM_CORES}" \ -a "${ADAPTER_R1}" \ -A "${ADAPTER_R2}" \ -q 20 \ --minimum-length 18 \ --nextseq-trim 20 \ --pair-filter=any \ -o "${READ1_OUTPUT}" \ -p "${READ2_OUTPUT}" \ "${READ1_INPUT}" "${READ2_INPUT}" -
4
Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
$ Bash example
# Clone the eCLIP pipeline repository from Yeo Lab: # git clone https://github.com/yeolab/eclip.git # cd eclip # # Set up the environment (e.g., using conda) and ensure 'quality-cutoff' script is executable and in your PATH. # # For example, if quality_cutoff.py is in eclip/src, you might need to run it as 'python eclip/src/quality_cutoff.py' or symlink it. # Define input and output paths R1_INPUT="/full/path/to/files/file_R1.C01.fastq.gz" R2_INPUT="/full/path/to/files/file_R2.C01.fastq.gz" R1_OUTPUT="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz" R2_OUTPUT="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz" METRICS_OUTPUT="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics" # Main 3' adapter for R1 ADAPTER_R1="NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # 5' adapters ADAPTER_5P_1="CTTCCGATCTACAAGTT" ADAPTER_5P_2="CTTCCGATCTTGGTCCT" # Additional 3' adapters (often for R2 or truncated versions) # Note: The adapter "CTTGT AGATCGGAAG" contains a space, which is unusual but kept as per the original command. ADAPTER_R2_LIST=( "AACTTGTAGATCGGA" "AGGACCAAGATCGGA" "ACTTGTAGATCGGAA" "GGACCAAGATCGGAA" "CTTGT AGATCGGAAG" "GACCAAGATCGGAAG" "TTGTAGATCGGAAGA" "ACCAAGATCGGAAGA" "TGTAGATCGGAAGAG" "CCAAGATCGGAAGAG" "GTAGATCGGAAGAGC" "CAAGATCGGAAGAGC" "TAGATCGGAAGAGCG" "AAGATCGGAAGAGCG" "AGATCGGAAGAGCGT" "GATCGGAAGAGCGTC" "ATCGGAAGAGCGTCG" "TCGGAAGAGCGTCGT" "CGGAAGAGCGTCGTG" "GGAAGAGCGTCGTGT" ) # Construct the -A arguments dynamically ADAPTER_A_ARGS="" for adapter in "${ADAPTER_R2_LIST[@]}"; do ADAPTER_A_ARGS+=" -A \"${adapter}\"" done quality-cutoff 6 -m 18 \ -a "${ADAPTER_R1}" \ -g "${ADAPTER_5P_1}" \ -g "${ADAPTER_5P_2}" \ ${ADAPTER_A_ARGS} \ -o "${R1_OUTPUT}" \ -p "${R2_OUTPUT}" \ "${R1_INPUT}" \ "${R2_INPUT}" \ > "${METRICS_OUTPUT}" -
5
Takes output from cutadapt round 1.
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt=3.4 # Define input and output files # INPUT_FASTQ: Output from cutadapt round 1, typically a FASTQ file. INPUT_FASTQ="input_round1.fastq.gz" OUTPUT_FASTQ="trimmed_round2.fastq.gz" # ADAPTER: Placeholder for the 3' adapter sequence. # For eCLIP, this is often the Illumina universal adapter or a specific small RNA adapter. # Replace with the actual adapter sequence used in your experiment. ADAPTER="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" # Execute cutadapt for 3' adapter trimming # -a: 3' adapter sequence # -o: Output file for trimmed reads # -m: Minimum length of reads to keep after trimming (e.g., 18 for eCLIP) # -q: Quality cutoff (e.g., 20 for Phred score 20) # -j: Number of CPU cores to use cutadapt \ -a "${ADAPTER}" \ -o "${OUTPUT_FASTQ}" \ -m 18 \ -q 20 \ -j 8 \ "${INPUT_FASTQ}" -
6
Run to trim off the 3â adapters on read 2, to control for double ligation events.
$ Bash example
# Install cutadapt (example using conda) # conda install -c bioconda cutadapt=1.18 # Define input and output files (placeholders) INPUT_R2="read2.fastq.gz" OUTPUT_TRIMMED_R2="trimmed_read2.fastq.gz" # Define the 3' adapter sequence for read 2 (eCLIP double ligation adapter). # This specific adapter sequence is used in the yeolab/eclip CWL workflow # for trimming R2 double ligation events. ADAPTER_R2="AAAAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" # Run cutadapt to trim the 3' adapter from read 2 # -a: Specifies a 3' adapter sequence to be removed from the 3' end of the reads. # -o: Specifies the output file for the trimmed reads. cutadapt -a "${ADAPTER_R2}" -o "${OUTPUT_TRIMMED_R2}" "${INPUT_R2}" -
7
Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
$ Bash example
# Install cutadapt (e.g., using conda) # conda install -c bioconda cutadapt cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
-
8
Takes output from cutadapt round 2.
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt=4.0 # Define input and output files # Input files are inferred from "Takes output from cutadapt round 2" INPUT_R1="round2_trimmed_R1.fastq.gz" INPUT_R2="round2_trimmed_R2.fastq.gz" OUTPUT_R1="round3_trimmed_R1.fastq.gz" OUTPUT_R2="round3_trimmed_R2.fastq.gz" # Define adapters (example Illumina adapters, adjust as needed for specific library prep) # For eCLIP, these might be specific to the library preparation protocol. # These are common Illumina adapters, often used in eCLIP pipelines like Skipper. ADAPTER_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example Illumina universal adapter ADAPTER_5PRIME="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Example Illumina small RNA 5' adapter # Define minimum read length (common value for eCLIP) MIN_LENGTH=18 # Number of cores to use NUM_CORES=8 # Execute cutadapt for adapter trimming and quality filtering cutadapt \ -a "${ADAPTER_3PRIME}" \ -A "${ADAPTER_5PRIME}" \ -o "${OUTPUT_R1}" \ -p "${OUTPUT_R2}" \ --minimum-length "${MIN_LENGTH}" \ --cores "${NUM_CORES}" \ --max-n 0.1 \ --trim-n \ "${INPUT_R1}" "${INPUT_R2}" -
9
Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.
$ Bash example
# Reference data: A FASTA file containing human rRNA, tRNA, snRNA, snoRNA, and other common repetitive elements (e.g., from RepBase). # For eCLIP, this is often a custom "decoy" genome, which can be generated by concatenating relevant repeat sequences. # Placeholder for reference FASTA: human_rRNA_tRNA_snRNA_snoRNA_repeats.fasta # Placeholder for STAR index directory: /path/to/star_repeat_index # Placeholder for input reads: reads.fastq.gz # Placeholder for output filtered reads: reads_filtered_repeats.fastq.gz # Step 1: Build STAR index for repetitive elements (if not already built) # This step is typically performed once for a given set of repeat sequences. # mkdir -p /path/to/star_repeat_index # STAR --runMode genomeGenerate \ # --genomeDir /path/to/star_repeat_index \ # --genomeFastaFiles human_rRNA_tRNA_snRNA_snoRNA_repeats.fasta \ # --runThreadN 8 # Adjust thread count as needed # Step 2: Align reads to the repetitive element index and extract unmapped reads # This effectively filters out reads that map to repetitive elements, leaving only non-repetitive reads. STAR --genomeDir /path/to/star_repeat_index \ --readFilesIn reads.fastq.gz \ --readFilesCommand zcat \ --outFileNamePrefix repeat_filtered_ \ --outSAMtype None \ --outReadsUnmapped Fastx \ --outStd Log \ --runThreadN 8 # Adjust thread count as needed # The unmapped reads are written to repeat_filtered_Unmapped.out.mate1 (for single-end reads) # or repeat_filtered_Unmapped.out.mate1 and repeat_filtered_Unmapped.out.mate2 (for paired-end reads) # Rename the output file for clarity mv repeat_filtered_Unmapped.out.mate1 reads_filtered_repeats.fastq.gz -
10
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
$ Bash example
# Define variables for input/output files and reference genome directory # Replace with actual paths relevant to your environment. # The REF_GENOME_DIR should point to a STAR-indexed genome directory for RepBase human sequences. REF_GENOME_DIR="/path/to/RepBase_human_database_file" READ1_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz" READ2_FILE="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz" OUTPUT_BAM_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam" # Note: The --outFileNamePrefix is set to the same path as the redirected BAM output. # This means auxiliary files (Log.out, SJ.out.tab, Unmapped.out.mate1, etc.) will be named with this full path as a prefix. OUTPUT_PREFIX_FOR_AUX_FILES="${OUTPUT_BAM_FILE}" # Execute STAR alignment STAR \ --runMode alignReads \ --runThreadN 16 \ --genomeDir "${REF_GENOME_DIR}" \ --genomeLoad LoadAndRemove \ --readFilesIn "${READ1_FILE}" "${READ2_FILE}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_PREFIX_FOR_AUX_FILES}" \ --outSAMattributes All \ --readFilesCommand zcat \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd \ > "${OUTPUT_BAM_FILE}" -
11
Takes output from STAR rmRep.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables GENOME_DIR="/path/to/genome/index/hg38" # Placeholder for human genome hg38 READ1="input_R1.fastq.gz" READ2="input_R2.fastq.gz" # Optional, if paired-end OUTPUT_PREFIX="aligned_rmRep_" THREADS=8 # Adjust as needed # Run STAR alignment with multi-mapping filter. # The description "rmRep" is interpreted as removing reads that map to multiple locations # or repetitive regions, which STAR can control via --outFilterMultimapNmax. # Setting --outFilterMultimapNmax 1 reports only uniquely mapping reads. STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1} ${READ2} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 1 -
12
Maps unique reads to the human genome.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables # Replace with actual paths and filenames GENOME_DIR="/path/to/STAR_human_GRCh38_index" # Pre-built STAR index for human GRCh38 INPUT_FASTQ="input_reads.fastq.gz" # Input FASTQ file OUTPUT_PREFIX="aligned_unique_reads" # Prefix for output files THREADS=8 # Number of threads to use, adjust as needed # Note: The STAR genome index must be pre-built using the 'STAR --runMode genomeGenerate' command. # Example for generating index (run once): # STAR --runThreadN ${THREADS} --runMode genomeGenerate --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/human_GRCh338.fa \ # --sjdbGTFfile /path/to/human_GRCh38.gtf \ # --sjdbOverhang 100 # Adjust sjdbOverhang based on read length - 1 # Map unique reads to the human genome using STAR # --outFilterMultimapNmax 1 ensures only uniquely mapping reads are reported. STAR --runThreadN ${THREADS} \ --genomeDir ${GENOME_DIR} \ --readFilesIn ${INPUT_FASTQ} \ --outFileNamePrefix ${OUTPUT_PREFIX} \ --outSAMtype BAM SortedByCoordinate \ --outReadsUnmapped Fastx \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 10 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 -
13
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star # Define variables for clarity # Placeholder for human hg38 genome directory. Replace with your actual STAR index path. GENOME_DIR="/path/to/STAR_indices/GRCh38_GENCODE_v38" INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1" INPUT_R2="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2" OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" # This prefix is used for auxiliary STAR output files (e.g., Log.out, SJ.out.tab). # The primary BAM output is redirected to OUTPUT_BAM. OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" STAR \ --runMode alignReads \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --genomeLoad LoadAndRemove \ --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMattributes All \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd > "${OUTPUT_BAM}" -
14
takes output from STAR genome mapping.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder for human hg38 genome index READ1_FASTQ="input_R1.fastq.gz" # Placeholder for input R1 FASTQ file READ2_FASTQ="input_R2.fastq.gz" # Placeholder for input R2 FASTQ file (remove if single-end) OUTPUT_PREFIX="aligned_sample" # Prefix for output files THREADS=8 # Number of threads to use # Run STAR genome mapping STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}." \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 -
15
Custom random-mer-aware script for PCR duplicate removal.
dedup_umi.py (Inferred with models/gemini-2.5-flash) vumi_tools 1.0.0 (as used in yeolab/eclip workflow) GitHub$ Bash example
# Clone the eclip repository to get dedup_umi.py # git clone https://github.com/yeolab/eclip.git # cd eclip/tools # Install umi_tools if not available # conda install -c bioconda umi_tools=1.0.0 # Example: PCR duplicate removal using the custom random-mer-aware script # This script (dedup_umi.py) wraps umi_tools dedup, handling UMI extraction from read IDs. # Input: aligned_reads.bam (BAM file with UMIs in read IDs, e.g., "readname_UMI") # Output: deduplicated_reads.bam, deduplication_stats.tsv, deduplication.log python eclip/tools/dedup_umi.py \ -i aligned_reads.bam \ -o deduplicated_reads.bam \ -s deduplication_stats.tsv \ -l deduplication.log \ --umi-separator "_" \ --extract-method "read_id" \ --paired
-
16
Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
$ Bash example
# Clone the eCLIP pipeline repository # git clone https://github.com/yeolab/eclip.git # cd eclip # Create and activate the conda environment (assuming environment.yml is present in the cloned repo) # conda env create -f environment.yml # conda activate eclip # Execute the barcode_collapse_pe.py script # Assuming the 'eclip' repository was cloned into the current directory. python eclip/scripts/barcode_collapse_pe.py \ --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam \ --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam \ --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics -
17
Takes output from barcode collapse PE.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.7.10a # Define variables # STAR genome index for hg38 (GRCh38) should be pre-built using STAR's genomeGenerate command. # Example: STAR --runMode genomeGenerate --genomeDir /path/to/star_index/hg38 --genomeFastaFiles /path/to/hg38.fa --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf --runThreadN <num_threads> STAR_INDEX="/path/to/star_index/hg38" # Placeholder for STAR genome index READ1="deduplicated_reads_R1.fastq.gz" # Input FASTQ R1 from barcode collapse PE READ2="deduplicated_reads_R2.fastq.gz" # Input FASTQ R2 from barcode collapse PE OUTPUT_PREFIX="aligned_deduplicated" # Prefix for output files THREADS=8 # Number of threads to use # Run STAR alignment for paired-end reads, typical for eCLIP pipelines STAR --genomeDir "${STAR_INDEX}" \ --readFilesIn "${READ1}" "${READ2}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --readFilesCommand zcat \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.05 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --sjdbScore 1 \ --outSAMattributes NH HI AS NM MD \ --limitBAMsortRAM 60000000000 # Adjust based on available RAM (e.g., 60GB for human genome) -
18
Sorts resulting bam file for use downstream.
samtools (Inferred with models/gemini-2.5-flash) v1.10.2 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools=1.10.2 # Sort the BAM file # Replace 'input.bam' with your unsorted BAM file # Replace 'output.sorted.bam' with the desired name for the sorted BAM file # Adjust '-@' for the number of threads/CPUs to use samtools sort -@ 8 -o output.sorted.bam input.bam
-
19
Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true
$ Bash example
# Install Picard (e.g., via conda) # conda install -c bioconda picard # Define variables for paths and files INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam" OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" TMP_DIR="/full/path/to/files/.queue/tmp" GATK_QUEUE_JAR="/path/to/gatk/dist/Queue.jar" # Path to GATK Queue.jar, which might contain bundled Picard # Create temporary directory if it doesn't exist mkdir -p "${TMP_DIR}" # Execute Picard SortSam via Java java -Xmx2048m \ -XX:+UseParallelOldGC \ -XX:ParallelGCThreads=4 \ -XX:GCTimeLimit=50 \ -XX:GCHeapFreeLimit=10 \ -Djava.io.tmpdir="${TMP_DIR}" \ -cp "${GATK_QUEUE_JAR}" \ net.sf.picard.sam.SortSam \ INPUT="${INPUT_BAM}" \ TMP_DIR="${TMP_DIR}" \ OUTPUT="${OUTPUT_BAM}" \ VALIDATION_STRINGENCY=SILENT \ SO=coordinate \ CREATE_INDEX=true -
20
Takes output from sortSam, makes bam index for use downstream.
samtools (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Assuming 'sorted.bam' is the output from sortSam samtools index sorted.bam
-
21
Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
$ Bash example
# Install samtools (e.g., using conda) # conda install -c bioconda samtools=1.10 INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" OUTPUT_BAI="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai" samtools index "${INPUT_BAM}" "${OUTPUT_BAI}" -
22
Takes inputs from multiple final bam files.
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools # Example: Merge multiple BAM files into a single BAM file. # This command takes multiple sorted BAM files as input and merges them. # The output BAM file will be sorted by coordinate. # Replace rep1.bam, rep2.bam, rep3.bam with your actual input BAM files. # Replace merged_output.bam with your desired output file name. samtools merge merged_output.bam rep1.bam rep2.bam rep3.bam
-
23
Merges the two technical replicates for further downstream analysis.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools=1.10 # Merge technical replicates (e.g., BAM files) into a single file # Replace 'replicate1.bam' and 'replicate2.bam' with actual input file names # Replace 'merged_replicates.bam' with the desired output file name samtools merge merged_replicates.bam replicate1.bam replicate2.bam # Index the merged BAM file for efficient access by downstream tools samtools index merged_replicates.bam
-
24
Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
$ Bash example
# Install samtools (e.g., using conda) # conda install -c bioconda samtools=1.19 # Define input and output paths OUTPUT_BAM="/full/path/to/files/CombinedID.merged.bam" INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" # Merge sorted BAM files samtools merge "${OUTPUT_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}" -
25
Takes output from sortSam, makes bam index for use downstream.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Assuming 'sorted.bam' is the output from sortSam samtools index sorted.bam
-
26
Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
-
27
Takes output from sortSam.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Example: Index a sorted BAM file # Input: sorted.bam (output from sortSam) # Output: sorted.bam.bai (BAM index file) samtools index sorted.bam
-
28
Only outputs the second read in each pair for use with single stranded peak caller.
$ Bash example
cp input_R2.fastq.gz output_single_end.fastq.gz
-
29
This is the final bam file to perform analysis on.
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # This step assumes an input BAM file (e.g., aligned_reads.bam) has been generated # and is being prepared as the 'final.bam' for downstream analysis. # The BAM file is typically aligned to a reference genome (e.g., hg38, mm10). # Sort the BAM file by coordinate, which is often a prerequisite for many downstream analyses. samtools sort -o final.bam aligned_reads.bam # Index the sorted BAM file, which is necessary for quick random access to reads (e.g., by region). samtools index final.bam
-
30
Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
$ Bash example
# Install samtools (e.g., using conda) # conda install -c bioconda samtools=1.10 # Filter BAM file to keep only second-in-pair reads (-f 128) and output in BAM format (-b) with header (-h) samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
-
31
Takes results from samtools view.
$ Bash example
# Install samtools (example using conda) # conda install -c bioconda samtools=1.19 # This command sorts a BAM file. The description "Takes results from samtools view" # implies that the input BAM file is the output from a previous 'samtools view' command # (e.g., a filtered or subsetted BAM file). # Sorting is a common and often necessary next step in bioinformatics pipelines, # typically performed before indexing or further downstream analyses. # Input: A BAM file (e.g., 'filtered_reads.bam' which is the result of 'samtools view'). # Output: A coordinate-sorted BAM file ('sorted_reads.bam'). samtools sort -o sorted_reads.bam filtered_reads.bam -
32
Calls peaks on those files.
clipper (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install clipper (if not already installed) # git clone https://github.com/yeolab/clipper.git # cd clipper # pip install . # cd .. # Placeholder for input and control BAM files # Replace with actual paths to your aligned IP and control BAM files. INPUT_BAM="path/to/your/ip_sample.bam" CONTROL_BAM="path/to/your/control_sample.bam" # Placeholder for genome size file (e.g., hg38.chrom.sizes) # Download from UCSC: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes GENOME_SIZE_FILE="path/to/your/hg38.chrom.sizes" # Output prefix for the peak file (clipper will append .bed) OUTPUT_PREFIX="eclip_peaks" # Execute clipper to call peaks # -b: IP sample BAM file # -c: Control sample BAM file # -s: Genome size file or integer (e.g., 2.9e9 for human hg38) # -o: Output prefix for peak files # -p: P-value threshold for peak calling # -f: FDR threshold for peak calling # --min-peak-width: Minimum width of a peak # --max-peak-width: Maximum width of a peak clipper -b "${INPUT_BAM}" \ -c "${CONTROL_BAM}" \ -s "${GENOME_SIZE_FILE}" \ -o "${OUTPUT_PREFIX}" \ -p 0.01 \ -f 0.05 \ --min-peak-width 10 \ --max-peak-width 500 -
33
Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle
$ Bash example
# Install CLIPper (if not already installed) # It's a Python package, often installed via pip or conda. # pip install clipper # or # conda install -c bioconda clipper # Define input and output paths (adjust as needed) INPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam" OUTPUT_BED="/full/path/to/files/CombinedID.merged.r2.peaks.bed" GENOME_ASSEMBLY="hg19" # Reference genome assembly # Run CLIPper peak calling clipper -b "${INPUT_BAM}" \ -s "${GENOME_ASSEMBLY}" \ -o "${OUTPUT_BED}" \ --bonferroni \ --superlocal \ --threshold-method binomial \ --save-pickle
Raw Source Text
Library strategy: eCLIP-Seq Takes output from raw files. Run to trim off both 5â and 3â adapters on both reads. Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics Takes output from cutadapt round 1. Run to trim off the 3â adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics Takes output from cutadapt round 2. Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam Takes output from STAR rmRep. Maps unique reads to the human genome. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam takes output from STAR genome mapping. Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics Takes output from barcode collapse PE. Sorts resulting bam file for use downstream. Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai Takes inputs from multiple final bam files. Merges the two technical replicates for further downstream analysis. Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai Takes output from sortSam. Only outputs the second read in each pair for use with single stranded peak caller. This is the final bam file to perform analysis on. Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam Takes results from samtools view. Calls peaks on those files. Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle Genome_build: hg19 Supplementary_files_format_and_content: bed format, contains clusters of predicted RBP binding, txt format contains counts of reads for both IP and Input for each gene in subtranscriptomic region, bigWigs are read densities for positive and negative strand genome wide