GSE107768 Processing Pipeline
Publication
A protein-RNA interaction atlas of the ribosome biogenesis factor AATF.Scientific reports (2019) — PMID 31363146
Processing Steps
Generate Jupyter Notebook-
1
Library strategy: eCLIP-seq
$ Bash example
# Install cwltool and other dependencies # pip install cwltool # conda install -c bioconda star clipper # For underlying tools if running manually # pip install git+https://github.com/yeolab/merge_peaks.git # For underlying tools if running manually # Define input files and parameters for the eCLIP CWL workflow # Replace with actual paths and values ECLIP_WORKFLOW_DIR="path/to/yeolab/eclip/workflow" # Clone https://github.com/yeolab/eclip.git ECLIP_CWL="${ECLIP_WORKFLOW_DIR}/eclip.cwl" INPUT_FASTQ_R1="path/to/sample_R1.fastq.gz" INPUT_FASTQ_R2="path/to/sample_R2.fastq.gz" # Optional, if paired-end CONTROL_FASTQ_R1="path/to/control_R1.fastq.gz" CONTROL_FASTQ_R2="path/to/control_R2.fastq.gz" # Optional, if paired-end GENOME_FASTA="path/to/genome.fa" # e.g., hg38 GENOME_GTF="path/to/annotations.gtf" # e.g., GENCODE v38 STAR_INDEX="path/to/star_index" # Pre-built STAR index for the genome OUTPUT_DIR="eclip_analysis_output" SAMPLE_ID="my_eclip_sample" # Create a CWL input YAML file # This is a simplified example; the actual workflow might require more inputs. # Refer to the eclip.cwl and eclip-inputs.yaml in the yeolab/eclip repository. cat << EOF > eclip_inputs.yaml fastq_r1: class: File path: ${INPUT_FASTQ_R1} # fastq_r2: # Uncomment if paired-end # class: File # path: ${INPUT_FASTQ_R2} control_fastq_r1: class: File path: ${CONTROL_FASTQ_R1} # control_fastq_r2: # Uncomment if paired-end # class: File # path: ${CONTROL_FASTQ_R2} genome_fasta: class: File path: ${GENOME_FASTA} genome_gtf: class: File path: ${GENOME_GTF} star_index_dir: class: Directory path: ${STAR_INDEX} output_directory: ${OUTPUT_DIR} sample_id: ${SAMPLE_ID} # Add other parameters as required by the eclip.cwl workflow, # such as adapter sequences, minimum read length, etc. EOF # Execute the eCLIP CWL workflow # This workflow internally uses tools like STAR for alignment, CLIPper for peak calling, # and merge_peaks for IDR. cwltool --outdir "${OUTPUT_DIR}" "${ECLIP_CWL}" eclip_inputs.yaml -
2
Takes output from raw files.
Not specified (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash)$ Bash example
# This step describes the input to the pipeline. # No specific tool or command can be inferred from "Takes output from raw files." # Assuming raw sequencing data in FASTQ format as input for subsequent steps. # Define variables for raw input files (example placeholders) RAW_FASTQ_R1="sample_R1.fastq.gz" RAW_FASTQ_R2="sample_R2.fastq.gz" # For paired-end data # Or for single-end data: # RAW_FASTQ="sample.fastq.gz" # Further pipeline steps would then process these files.
-
3
Run to trim off both 5â and 3â adapters on both reads.
$ Bash example
# Install fastp if not already installed # conda install -c bioconda fastp=0.23.2 # Define input and output file names READ1_IN="read1.fastq.gz" READ2_IN="read2.fastq.gz" READ1_OUT="trimmed_read1.fastq.gz" READ2_OUT="trimmed_read2.fastq.gz" JSON_REPORT="fastp_report.json" HTML_REPORT="fastp_report.html" THREADS=8 QUAL_THRESHOLD=15 # Example quality threshold for filtering low quality bases MIN_LENGTH=20 # Example minimum read length after trimming # Run fastp to trim adapters and perform quality filtering # --detect_adapter_for_pe: Automatically detect adapters for paired-end reads # --trim_poly_g: Trim polyG tails (common in Illumina NextSeq/NovaSeq) # --trim_poly_x: Trim polyX tails (any base) # --correction: Enable base correction for overlapping reads # --cut_by_quality5/3: Cut reads by quality from 5' and 3' ends # --cut_window_size: Window size for quality cutting # --cut_mean_quality: Mean quality requirement in the window # --low_complexity_filter: Filter reads with low complexity # --complexity_threshold: Threshold for low complexity filtering fastp \ --in1 "${READ1_IN}" \ --in2 "${READ2_IN}" \ --out1 "${READ1_OUT}" \ --out2 "${READ2_OUT}" \ --json "${JSON_REPORT}" \ --html "${HTML_REPORT}" \ --detect_adapter_for_pe \ --thread "${THREADS}" \ --qualified_quality_phred "${QUAL_THRESHOLD}" \ --length_required "${MIN_LENGTH}" \ --trim_poly_g \ --trim_poly_x \ --correction \ --cut_by_quality5 \ --cut_by_quality3 \ --cut_window_size 4 \ --cut_mean_quality 20 \ --low_complexity_filter \ --complexity_threshold 30 -
4
Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
quality-cutoff.py (Inferred with models/gemini-2.5-flash) vNot explicitly stated (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install Python (if not already installed) # conda install python # Clone the eCLIP pipeline repository to get the quality-cutoff.py script # git clone https://github.com/yeolab/eclip.git # cd eclip/src # Ensure quality-cutoff.py is executable or run with python # chmod +x quality-cutoff.py # Execute the quality-cutoff.py script for adapter trimming and quality filtering # Note: The original command 'quality-cutoff 6' is interpreted as 'python quality-cutoff.py -q 6' # based on the usage of the quality-cutoff.py script from the yeolab/eclip repository. # Also, '-A CTTGT AGATCGGAAG' is split into two separate -A arguments for proper parsing. python quality-cutoff.py -q 6 \ -m 18 \ -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \ -g CTTCCGATCTACAAGTT \ -g CTTCCGATCTTGGTCCT \ -A AACTTGTAGATCGGA \ -A AGGACCAAGATCGGA \ -A ACTTGTAGATCGGAA \ -A GGACCAAGATCGGAA \ -A CTTGT \ -A AGATCGGAAG \ -A GACCAAGATCGGAAG \ -A TTGTAGATCGGAAGA \ -A ACCAAGATCGGAAGA \ -A TGTAGATCGGAAGAG \ -A CCAAGATCGGAAGAG \ -A GTAGATCGGAAGAGC \ -A CAAGATCGGAAGAGC \ -A TAGATCGGAAGAGCG \ -A AAGATCGGAAGAGCG \ -A AGATCGGAAGAGCGT \ -A GATCGGAAGAGCGTC \ -A ATCGGAAGAGCGTCG \ -A TCGGAAGAGCGTCGT \ -A CGGAAGAGCGTCGTG \ -A GGAAGAGCGTCGTGT \ -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz \ -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz \ /full/path/to/files/file_R1.C01.fastq.gz \ /full/path/to/files/file_R2.C01.fastq.gz \ > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics -
5
Takes output from cutadapt round 1.
$ Bash example
# Install cutadapt if not already installed # conda create -n cutadapt_env cutadapt=1.18 # conda activate cutadapt_env # Define input and output files INPUT_FASTQ="input_from_cutadapt_round1.fastq.gz" OUTPUT_FASTQ="output_cutadapt_round2.fastq.gz" REPORT_JSON="cutadapt_round2_report.json" # Define adapter sequence (example: Illumina universal adapter) # For eCLIP, this is typically a specific 3' adapter or a library-specific adapter. # This example uses a common Illumina universal adapter. ADAPTER_SEQUENCE="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Execute cutadapt for adapter and quality trimming # -a: 3' adapter sequence to remove # -q: Trim low-quality ends (e.g., 20 for phred score 20) # --minimum-length: Discard reads shorter than this length after trimming (e.g., 18 for eCLIP reads) # -o: Output file for trimmed reads # --json: Write a JSON report with trimming statistics cutadapt -a "${ADAPTER_SEQUENCE}" \ -q 20 \ --minimum-length 18 \ -o "${OUTPUT_FASTQ}" \ --json "${REPORT_JSON}" \ "${INPUT_FASTQ}" -
6
Run to trim off the 3â adapters on read 2, to control for double ligation events.
$ Bash example
# Install cutadapt (e.g., via conda) # conda install -c bioconda cutadapt=4.0 # Define input and output file paths INPUT_R1="read1.fastq.gz" INPUT_R2="read2.fastq.gz" OUTPUT_R1="trimmed_read1.fastq.gz" OUTPUT_R2="trimmed_read2.fastq.gz" # The 3' adapter sequence for Read 2, common in eCLIP and Illumina sequencing # This adapter is used to control for double ligation events. ADAPTER_R2="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Run cutadapt to trim 3' adapters from Read 2 # -A: Specifies a 3' adapter for the second read in a paired-end library. # -o: Output file for the first read. # -p: Output file for the second read. # --minimum-length 18: Discard reads shorter than 18 bp after trimming. This is a common setting for eCLIP to remove very short fragments. # --cores 4: Use 4 CPU cores for faster processing (adjust based on available resources). cutadapt -A "${ADAPTER_R2}" \ -o "${OUTPUT_R1}" \ -p "${OUTPUT_R2}" \ "${INPUT_R1}" \ "${INPUT_R2}" \ --minimum-length 18 \ --cores 4 -
7
Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
$ Bash example
# Install cutadapt (e.g., via conda) # conda install -c bioconda cutadapt cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
-
8
Takes output from cutadapt round 2.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt # Define input and output files # INPUT_FASTQ should be the output file from the previous cutadapt round (round 1). INPUT_FASTQ="input_from_cutadapt_round1.fastq.gz" OUTPUT_FASTQ="output_cutadapt_round2.fastq.gz" LOG_FILE="cutadapt_round2.log" # Define common eCLIP trimming parameters (example values from Yeo lab pipelines, adjust as needed) # For cutadapt version 3.4, --minimum-length, --quality-cutoff, --nextseq-trim are standard. MIN_READ_LENGTH=18 # Minimum read length after trimming QUALITY_CUTOFF=20 # Quality cutoff for 3' end trimming (e.g., Phred score 20) NEXTSEQ_TRIM=20 # Quality cutoff for NextSeq-specific 3' end trimming (e.g., Phred score 20) # If a 5' adapter needs to be trimmed in round 2, define it here. # This is often for random Ns (e.g., UMI/barcode) or a specific linker. # Example: ADAPTER_5_PRIME="NNNNNNNNNN" # For 10 random Ns at the 5' end # Example: ADAPTER_5_PRIME="AAGCAGTGGTATCAACGCAGAGTAC" # A common 5' adapter sequence # If no 5' adapter trimming is required in this round, set ADAPTER_5_PRIME="" or omit the -g parameter. ADAPTER_5_PRIME="" # Placeholder: Set to your specific 5' adapter sequence if needed # Execute cutadapt for round 2: quality trimming, minimum length filtering, and optional 5' adapter trimming # This command assumes single-end reads. cutadapt \ --minimum-length ${MIN_READ_LENGTH} \ --quality-cutoff ${QUALITY_CUTOFF} \ --nextseq-trim=${NEXTSEQ_TRIM} \ ${ADAPTER_5_PRIME:+-g "${ADAPTER_5_PRIME}"} \ -o ${OUTPUT_FASTQ} \ ${INPUT_FASTQ} \ > ${LOG_FILE} 2>&1 -
9
Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.
bbduk.sh (Inferred with models/gemini-2.5-flash) v38.90 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install BBMap suite (contains bbduk.sh) # conda install -c bioconda bbmap # Placeholder for human RepBase and rRNA sequences. These files need to be prepared. # A common approach is to combine known human repetitive elements (e.g., from RepBase) and rRNA sequences. # Example (hypothetical links, actual RepBase access may require license): # wget -O human_repeats.fasta "https://www.girinst.org/repbase/update/human_repeats.fasta" # wget -O hg38_rRNA.fasta "https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.rRNA.fa.gz" # cat human_repeats.fasta hg38_rRNA.fasta > human_repbase_rRNA.fasta # Run bbduk.sh to remove reads matching human repetitive elements and rRNA bbduk.sh \ in=input_reads.fastq.gz \ out=filtered_reads.fastq.gz \ ref=human_repbase_rRNA.fasta \ k=31 \ hdist=1 \ minidentity=90 \ stats=bbduk_repeat_rRNA_stats.txt \ tpe \ tbo \ minlen=15
-
10
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
$ Bash example
# Define input and output paths READ1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz" READ2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz" OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam" OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam" # Define the genome directory for RepBase # This directory should contain a STAR index built from a RepBase FASTA file. # Example for building a RepBase index (adjust paths and RepBase FASTA as needed): # wget -O RepBase_human.fasta.gz "https://www.girinst.org/server/RepBase/protected/RepBase20.09.fasta.gz" # (Requires login) # gunzip RepBase_human.fasta.gz # STAR --runMode genomeGenerate --genomeDir /path/to/RepBase_human_database_file --genomeFastaFiles RepBase_human.fasta --genomeSAindexNbases 10 # Adjust index size if RepBase is small GENOME_DIR="/path/to/RepBase_human_database_file" # Execute STAR alignment command STAR \ --runMode alignReads \ --runThreadN 16 \ --genomeDir "${GENOME_DIR}" \ --genomeLoad LoadAndRemove \ --readFilesIn "${READ1}" "${READ2}" \ --outSAMunmapped Within \ --outFilterMultimapNmax 30 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMattributes All \ --readFilesCommand zcat \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd > "${OUTPUT_BAM}" -
11
Takes output from STAR rmRep.
$ Bash example
# Install STAR and Samtools # conda install -c bioconda star samtools # Define reference paths (using hg38 as a placeholder) GENOME_FASTA="/path/to/references/GRCh38/GRCh38.primary_assembly.genome.fa" GTF_FILE="/path/to/references/GRCh38/gencode.v44.annotation.gtf" STAR_INDEX_DIR="/path/to/references/GRCh38/STAR_index" # Create STAR genome index if it doesn't exist (optional, usually done once) # STAR --runMode genomeGenerate \ # --genomeDir "${STAR_INDEX_DIR}" \ # --genomeFastaFiles "${GENOME_FASTA}" \ # --sjdbGTFfile "${GTF_FILE}" \ # --sjdbOverhang 100 \ # --runThreadN 8 # Define input and output files INPUT_R1="sample_R1.fastq.gz" INPUT_R2="sample_R2.fastq.gz" OUTPUT_PREFIX="sample_aligned" # 1. Align reads with STAR STAR --genomeDir "${STAR_INDEX_DIR}" \ --readFilesIn "${INPUT_R1}" "${INPUT_R2}" \ --runThreadN 8 \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped None \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Adjust based on available RAM (e.g., 30GB) # Output from STAR is ${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam # 2. Remove PCR duplicates using samtools markdup (interpreting "rmRep" as remove duplicates) # The -r option removes duplicates, -s outputs statistics samtools markdup -r -s "${OUTPUT_PREFIX}_Aligned.sortedByCoord.out.bam" "${OUTPUT_PREFIX}_Aligned.dedup.bam" # Index the deduplicated BAM file samtools index "${OUTPUT_PREFIX}_Aligned.dedup.bam" -
12
Maps unique reads to the human genome.
$ Bash example
# Install BWA and Samtools if not already present # conda install -c bioconda bwa samtools # Define variables REFERENCE_GENOME_PREFIX="human_genome_hg38" # Placeholder for indexed human genome reference (e.g., hg38) READ1="input_read1.fastq.gz" READ2="input_read2.fastq.gz" OUTPUT_BAM="mapped_reads.bam" NUM_THREADS=8 READ_GROUP="@RG\tID:sample_id\tSM:sample_name\tPL:ILLUMINA\tLB:library_name" # Index the reference genome (if not already indexed). This creates .sa, .pac, .bwt, .ann, .amb files. # bwa index ${REFERENCE_GENOME_PREFIX}.fa # Map reads to the human genome using BWA-MEM # -t: Number of threads # -M: Mark shorter split hits as secondary (recommended for Picard compatibility) # -R: Read group header (important for downstream processing like GATK) bwa mem -t ${NUM_THREADS} -M -R "${READ_GROUP}" ${REFERENCE_GENOME_PREFIX}.fa ${READ1} ${READ2} | \ # Convert SAM to BAM and sort the BAM file samtools view -bS - | \ samtools sort -o ${OUTPUT_BAM} # Optional: Index the sorted BAM file # samtools index ${OUTPUT_BAM} # Optional: Filter for uniquely mapped reads (e.g., MAPQ >= 20 and primary alignment) # samtools view -b -q 20 -F 0x100 ${OUTPUT_BAM} > ${OUTPUT_BAM%.bam}.unique.bam -
13
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
$ Bash example
bash # Reference genome directory: /path/to/STAR_database_file # Input files: /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1, /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 # Output file: /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam STAR \ --runMode alignReads \ --runThreadN 16 \ --genomeDir /path/to/STAR_database_file \ --genomeLoad LoadAndRemove \ --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 \ --outSAMunmapped Within \ --outFilterMultimapNmax 1 \ --outFilterMultimapScoreRange 1 \ --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam \ --outSAMattributes All \ --outStd BAM_Unsorted \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outReadsUnmapped Fastx \ --outFilterScoreMin 10 \ --outSAMattrRGline ID:foo \ --alignEndsType EndToEnd \ > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
-
14
takes output from STAR genome mapping.
$ Bash example
# Installation (commented out) # conda install -c bioconda star=2.5.2b # Placeholder for genome index directory GENOME_DIR="/path/to/STAR_index/GRCh38" # Placeholder for input FASTQ file (single-end as commonly used in eCLIP) READ_FILE="sample.fastq.gz" # Placeholder for output prefix OUTPUT_PREFIX="sample_aligned_" # Number of threads THREADS=8 STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ_FILE}" \ --readFilesCommand zcat \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # 30GB -
15
Custom random-mer-aware script for PCR duplicate removal.
$ Bash example
# Install umi_tools # conda install -c bioconda umi-tools=1.1.2 # Example: Deduplicate a BAM file using umi_tools, assuming UMIs are in read names # This command is suitable for eCLIP data where UMIs are often extracted # into the read headers in a preceding step. # The --spliced-reads flag is important for RNA-seq based assays like eCLIP. umi_tools dedup \ --input input.bam \ --output deduplicated.bam \ --method directional \ --paired \ --spliced-reads \ --output-stats deduplication_stats.tsv \ --log deduplication.log -
16
Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
barcode_collapse_pe.py vv0.1.0 (part of Skipper pipeline) (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install Miniconda or Anaconda if not already installed # wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh # bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda # source $HOME/miniconda/bin/activate # conda init bash # Clone the Skipper repository to get the script and environment file # git clone https://github.com/yeolab/skipper.git # cd skipper # Create and activate the conda environment using the provided environment.yml # conda env create -f environment.yml # conda activate skipper_env # Navigate to the scripts directory (assuming you are in the 'skipper' directory) # cd scripts # Execute the barcode collapse command python barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
-
17
Takes output from barcode collapse PE.
$ Bash example
# Install cutadapt if not already installed # conda install -c bioconda cutadapt=4.0 # Define input and output file names (placeholders) # INPUT_R1 and INPUT_R2 are the output from the 'barcode collapse PE' step. INPUT_R1="barcode_collapsed_reads_R1.fastq.gz" INPUT_R2="barcode_collapsed_reads_R2.fastq.gz" OUTPUT_R1="trimmed_R1.fastq.gz" OUTPUT_R2="trimmed_R2.fastq.gz" OUTPUT_LOG="cutadapt.log" # Define common Illumina adapters for paired-end reads, often used in eCLIP pipelines. # These specific adapters are found in the Yeo lab's eCLIP workflow. ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Illumina TruSeq Universal Adapter ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Illumina TruSeq Adapter, Index 1 # Execute cutadapt for adapter trimming on paired-end reads. # -a: 3' adapter for R1 # -A: 3' adapter for R2 # -o: output R1 file # -p: output R2 file # -m 18: Discard reads shorter than 18 bp after trimming. # -q 20: Trim low-quality bases from 3' end using a quality cutoff of 20. # -e 0.1: Maximum error rate of 10% for adapter matching. cutadapt -a "${ADAPTER_R1}" -A "${ADAPTER_R2}" \ -o "${OUTPUT_R1}" -p "${OUTPUT_R2}" \ -m 18 -q 20 -e 0.1 \ "${INPUT_R1}" "${INPUT_R2}" > "${OUTPUT_LOG}" 2>&1 -
18
Sorts resulting bam file for use downstream.
$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools # Sort the BAM file by coordinate # -o: Output file name # -@: Number of threads to use (adjust as needed) # -m: Maximum memory per thread (e.g., 2G for 2GB, adjust as needed) # Replace 'input.bam' with the path to your unsorted BAM file. # Replace 'output.bam' with the desired name for your sorted BAM file. samtools sort -o output.bam -@ 8 -m 2G input.bam
-
19
Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true
$ Bash example
# Install Picard (often bundled with GATK or available standalone) # conda install -c bioconda picard # If using GATK, it's included: # conda install -c bioconda gatk4 # Define variables for paths INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam" OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" TMP_DIR="/full/path/to/files/.queue/tmp" GATK_DIST_PATH="/path/to/gatk/dist" # Path where Queue.jar is located # Create temporary directory if it doesn't exist mkdir -p "${TMP_DIR}" # Execute Picard SortSam java -Xmx2048m \ -XX:+UseParallelOldGC \ -XX:ParallelGCThreads=4 \ -XX:GCTimeLimit=50 \ -XX:GCHeapFreeLimit=10 \ -Djava.io.tmpdir="${TMP_DIR}" \ -cp "${GATK_DIST_PATH}/Queue.jar" \ net.sf.picard.sam.SortSam \ INPUT="${INPUT_BAM}" \ TMP_DIR="${TMP_DIR}" \ OUTPUT="${OUTPUT_BAM}" \ VALIDATION_STRINGENCY=SILENT \ SO=coordinate \ CREATE_INDEX=true -
20
Takes output from sortSam, makes bam index for use downstream.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Example: Assuming 'sorted.bam' is the output from sortSam # Replace 'sorted.bam' with the actual path to your sorted BAM file samtools index sorted.bam
-
21
Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # Create a BAM index file samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
-
22
Takes inputs from multiple final bam files.
samtools (Inferred with models/gemini-2.5-flash) v1.19.2 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already available # conda install -c bioconda samtools=1.19.2 # This command merges multiple final BAM files into a single output BAM file. # This is a common step when combining technical replicates or data from different lanes of the same sample. # Replace 'input1.bam', 'input2.bam', 'input3.bam' with the actual paths to your final BAM files. # Replace 'merged_output.bam' with the desired name for the merged file. samtools merge merged_output.bam input1.bam input2.bam input3.bam
-
23
Merges the two technical replicates for further downstream analysis.
samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
bash # Install samtools if not already available # conda install -c bioconda samtools=1.10 # Merge two technical replicate BAM files # Replace replicate1.bam and replicate2.bam with actual input file names # Replace merged_replicates.bam with the desired output file name samtools merge merged_replicates.bam replicate1.bam replicate2.bam
-
24
Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
$ Bash example
# Install samtools (if not already installed) # conda install -c bioconda samtools # Define input and output files INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam" OUTPUT_BAM="/full/path/to/files/CombinedID.merged.bam" # Execute samtools merge command samtools merge "${OUTPUT_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}" -
25
Takes output from sortSam, makes bam index for use downstream.
samtools index (Inferred with models/gemini-2.5-flash) v1.19.1 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Assuming 'sorted.bam' is the output from sortSam samtools index sorted.bam
-
26
Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
$ Bash example
# Install samtools (e.g., via conda) # conda install -c bioconda samtools # Create a BAM index file samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai
-
27
Takes output from sortSam.
samtools (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Input BAM file (output from sortSam) INPUT_BAM="sorted.bam" # Index the sorted BAM file to allow for fast random access samtools index "${INPUT_BAM}" -
28
Only outputs the second read in each pair for use with single stranded peak caller.
$ Bash example
# BBTools installation # BBTools can be downloaded from SourceForge or installed via Bioconda. # For example, using Bioconda: # conda install -c bioconda bbtools # This command assumes 'input_interleaved.fastq.gz' is a single file containing # paired-end reads in an interleaved format (R1, R2, R1, R2, ...). # 'reformat.sh' extracts only the second read of each pair and writes it to 'output_R2.fastq.gz'. reformat.sh in=input_interleaved.fastq.gz out2=output_R2.fastq.gz
-
29
This is the final bam file to perform analysis on.
N/A (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
No specific tool or action can be inferred from the description 'This is the final bam file to perform analysis on.' This description refers to a resulting file rather than a processing step.
-
30
Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
$ Bash example
# Install samtools (e.g., using conda) # conda install -c bioconda samtools=1.10 samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam
-
31
Takes results from samtools view.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Example: Convert a SAM file to a BAM file # This command takes a SAM file (input.sam) and converts it to a BAM file (output.bam). # The description "Takes results from samtools view" implies that this step either performs # samtools view or processes its output. Given the tool is samtools, this example shows a common use of samtools view. # -b: Output BAM format # -S: Input is SAM format (optional for samtools 1.x as it auto-detects, but good for clarity) # Replace input.sam and output.bam with your actual file names. samtools view -bS input.sam > output.bam # Another common use case: Filter mapped reads from a BAM file # samtools view -F 4 input.bam > mapped_reads.bam # -F 4: Exclude reads where the FLAG indicates the read is unmapped (0x4)
-
32
Calls peaks on those files.
$ Bash example
# Installation (example for a conda environment) # git clone https://github.com/yeolab/clipper.git # cd clipper # conda create -n clipper_env python=3.8 numpy scipy pysam pybedtools matplotlib seaborn pandas statsmodels -y # conda activate clipper_env # Define input and output files (placeholders) # Replace 'ip_sample.bam' with your immunoprecipitation (IP) BAM file # Replace 'sm_input.bam' with your size-matched input (SMInput) control BAM file # Ensure these BAM files are coordinate-sorted and indexed (.bai files exist). IP_BAM="ip_sample.bam" SM_INPUT_BAM="sm_input.bam" OUTPUT_PREFIX="clipper_peaks" # Reference dataset: Placeholder for human hg38 effective genome size # This value represents the mappable portion of the genome. Adjust if using a different organism or assembly. GENOME_SIZE="2.7e9" # Approximate effective genome size for human hg38 # Peak calling parameters FDR_THRESHOLD="0.05" # False Discovery Rate threshold P_VALUE_THRESHOLD="0.01" # P-value threshold (default for clipper) # Execute clipper for peak calling # Assuming clipper.py is in the current directory or in your PATH python clipper.py \ -b "${IP_BAM}" \ -c "${SM_INPUT_BAM}" \ -o "${OUTPUT_PREFIX}" \ -s "${GENOME_SIZE}" \ -f "${FDR_THRESHOLD}" \ -p "${P_VALUE_THRESHOLD}" # Output files will include: clipper_peaks.bed (BED file of called peaks) -
33
Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle
$ Bash example
# Install CLIPper (if not already installed) # It's recommended to install CLIPper in a dedicated conda environment. # conda create -n clipper_env python=3.8 # conda activate clipper_env # pip install clipper # Define input and output paths INPUT_BAM="/full/path/to/files/CombinedID.merged.r2.bam" OUTPUT_BED="/full/path/to/files/CombinedID.merged.r2.peaks.bed" GENOME_ASSEMBLY="hg19" # Reference genome assembly # Run CLIPper peak calling clipper -b "${INPUT_BAM}" -s "${GENOME_ASSEMBLY}" -o "${OUTPUT_BED}" \ --bonferroni --superlocal --threshold-method binomial --save-pickle
Raw Source Text
Library strategy: eCLIP-seq Takes output from raw files. Run to trim off both 5â and 3â adapters on both reads. Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics Takes output from cutadapt round 1. Run to trim off the 3â adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics Takes output from cutadapt round 2. Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam Takes output from STAR rmRep. Maps unique reads to the human genome. Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam takes output from STAR genome mapping. Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics Takes output from barcode collapse PE. Sorts resulting bam file for use downstream. Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai Takes inputs from multiple final bam files. Merges the two technical replicates for further downstream analysis. Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam Takes output from sortSam, makes bam index for use downstream. Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai Takes output from sortSam. Only outputs the second read in each pair for use with single stranded peak caller. This is the final bam file to perform analysis on. Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam Takes results from samtools view. Calls peaks on those files. Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle Genome_build: hg19 Supplementary_files_format_and_content: bed format, contains clusters of predicted RBP binding