GSE77629 Processing Pipeline

OTHER code_examples 32 steps

Publication

Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).

Nature methods (2016) — PMID 27018577

Dataset

Enhanced CLIP (eCLIP) enables robust and scalable transcriptome-wide discovery and characterization of RNA binding protein binding sites [eCLIP - 293…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Takes output from raw files.

cutadapt (Inferred with models/gemini-2.5-flash) v4.4 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt

# Define input and output file names (placeholders)
INPUT_R1="sample_R1.fastq.gz"
INPUT_R2="sample_R2.fastq.gz"
OUTPUT_R1_TRIMMED="sample_R1_trimmed.fastq.gz"
OUTPUT_R2_TRIMMED="sample_R2_trimmed.fastq.gz"

# Define common Illumina adapters (adjust if different adapters were used)
# Forward adapter sequence
ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
# Reverse adapter sequence
ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

# Run cutadapt for adapter trimming and quality filtering
# -a: 3' adapter for read 1
# -A: 3' adapter for read 2
# -o: Output file for read 1
# -p: Output file for read 2
# -m: Minimum read length after trimming
# -q: Trim low-quality ends from reads (Phred score threshold)
cutadapt -a "${ADAPTER_FWD}" \
         -A "${ADAPTER_REV}" \
         -o "${OUTPUT_R1_TRIMMED}" \
         -p "${OUTPUT_R2_TRIMMED}" \
         -m 20 -q 20 \
         "${INPUT_R1}" "${INPUT_R2}"

View on GitHub

Run to trim off both 5â and 3â adapters on both reads.

cutadapt (Inferred with models/gemini-2.5-flash) v4.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install cutadapt (e.g., using conda)
# conda install -c bioconda cutadapt=4.0

# Define input and output files
READ1_IN="input_R1.fastq.gz"
READ2_IN="input_R2.fastq.gz"
READ1_OUT="trimmed_R1.fastq.gz"
READ2_OUT="trimmed_R2.fastq.gz"

# Define 3' adapter sequences (Illumina universal adapters as placeholders)
# These are searched for at the 3' end of the reads.
ADAPTER_3PRIME_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # For Read 1
ADAPTER_3PRIME_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # For Read 2 (reverse complement of Read 1 adapter)

# Define 5' adapter sequences (placeholders, replace with actual 5' adapters if known)
# These are searched for at the 5' end of the reads.
ADAPTER_5PRIME_R1="GCTCTTCCGATCT" # Example 5' adapter sequence for Read 1
ADAPTER_5PRIME_R2="GCTCTTCCGATCT" # Example 5' adapter sequence for Read 2

# Number of CPU threads to use
NUM_THREADS=$(nproc)

# Run cutadapt to trim 5' and 3' adapters from both paired-end reads
cutadapt \
  -j "${NUM_THREADS}" \
  -a "${ADAPTER_3PRIME_R1}" \
  -A "${ADAPTER_3PRIME_R2}" \
  -g "${ADAPTER_5PRIME_R1}" \
  -G "${ADAPTER_5PRIME_R2}" \
  -o "${READ1_OUT}" \
  -p "${READ2_OUT}" \
  "${READ1_IN}" \
  "${READ2_IN}"

View on GitHub

Command: quality-cutoff 6 -m 18 -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics

quality_trimming.py (eCLIP pipeline) veCLIP pipeline (yeolab/eclip) - version inferred GitHub

$ Bash example

# Clone the eCLIP pipeline repository
# git clone https://github.com/yeolab/eclip.git
# cd eclip

# Create and activate a conda environment with necessary dependencies
# conda create -n eclip_env python=3.8 cutadapt=3.4 -y
# conda activate eclip_env

# Execute the quality trimming script
python scripts/quality_trimming.py \
    --quality-cutoff 6 \
    --min-length 18 \
    --adapter-3prime NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
    --adapter-5prime CTTCCGATCTACAAGTT \
    --adapter-5prime CTTCCGATCTTGGTCCT \
    --adapter-3prime AACTTGTAGATCGGA \
    --adapter-3prime AGGACCAAGATCGGA \
    --adapter-3prime ACTTGTAGATCGGAA \
    --adapter-3prime GGACCAAGATCGGAA \
    --adapter-3prime "CTTGT AGATCGGAAG" \
    --adapter-3prime GACCAAGATCGGAAG \
    --adapter-3prime TTGTAGATCGGAAGA \
    --adapter-3prime ACCAAGATCGGAAGA \
    --adapter-3prime TGTAGATCGGAAGAG \
    --adapter-3prime CCAAGATCGGAAGAG \
    --adapter-3prime GTAGATCGGAAGAGC \
    --adapter-3prime CAAGATCGGAAGAGC \
    --adapter-3prime TAGATCGGAAGAGCG \
    --adapter-3prime AAGATCGGAAGAGCG \
    --adapter-3prime AGATCGGAAGAGCGT \
    --adapter-3prime GATCGGAAGAGCGTC \
    --adapter-3prime ATCGGAAGAGCGTCG \
    --adapter-3prime TCGGAAGAGCGTCGT \
    --adapter-3prime CGGAAGAGCGTCGTG \
    --adapter-3prime GGAAGAGCGTCGTGT \
    --output-R1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz \
    --output-R2 /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz \
    --input-R1 /full/path/to/files/file_R1.C01.fastq.gz \
    --input-R2 /full/path/to/files/file_R2.C01.fastq.gz \
    > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics

View on GitHub

Takes output from cutadapt round 1.

cutadapt v1.18 GitHub

$ Bash example

# Install cutadapt if not already installed
# conda install -c bioconda cutadapt=1.18

# Define input and output files
INPUT_FASTQ="round1_trimmed.fastq.gz"
OUTPUT_FASTQ="round2_trimmed.fastq.gz"

# Define common eCLIP 3' adapter sequence (Illumina TruSeq or similar)
# This adapter is commonly used in eCLIP workflows for 3' end trimming.
ADAPTER_3PRIME="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"

# Define trimming parameters
MIN_LENGTH=18 # Minimum read length after trimming, common for eCLIP
QUALITY_THRESHOLD=20 # Quality threshold for 3' end trimming, common for eCLIP

# Execute cutadapt for round 2 trimming
# This command assumes further trimming of the 3' adapter and quality filtering
# after an initial trimming round.
cutadapt -a "${ADAPTER_3PRIME}" \
         -m "${MIN_LENGTH}" \
         -q "${QUALITY_THRESHOLD}" \
         -o "${OUTPUT_FASTQ}" \
         "${INPUT_FASTQ}"

View on GitHub

Run to trim off the 3â adapters on read 2, to control for double ligation events.

cutadapt (Inferred with models/gemini-2.5-flash) v3.4 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install cutadapt (example using conda)
# conda install -c bioconda cutadapt=3.4

# Define input and output file paths (placeholders)
READ1_IN="input_R1.fastq.gz"
READ2_IN="input_R2.fastq.gz"
READ1_OUT="trimmed_R1.fastq.gz"
READ2_OUT="trimmed_R2.fastq.gz"

# Define the 3' adapter sequence for Read 2 (common eCLIP adapter, inferred from Yeo lab pipelines)
ADAPTER_R2="AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"

# Define minimum length and quality trim threshold (inferred from Yeo lab pipelines)
MIN_LEN=18
NEXTSEQ_TRIM_QUAL=20

# Run cutadapt to trim 3' adapters from Read 2
cutadapt \
  -a "${ADAPTER_R2}" \
  -o "${READ1_OUT}" \
  -p "${READ2_OUT}" \
  --minimum-length "${MIN_LEN}" \
  --nextseq-trim "${NEXTSEQ_TRIM_QUAL}" \
  "${READ1_IN}" \
  "${READ2_IN}"

View on GitHub

Command: cutadapt -f fastq --match-read-wildcards --times 1 -e 0.1 -O 5 --quality-cutoff 6 -m 18 -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics

cutadapt vInferred with models/gemini-2.5-flash GitHub

$ Bash example

# Install cutadapt (example using conda)
# conda install -c bioconda cutadapt

# Define input and output paths
INPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz"
INPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz"
OUTPUT_R1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz"
OUTPUT_R2="/full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz"
METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics"

# Execute cutadapt command
cutadapt -f fastq \
    --match-read-wildcards \
    --times 1 \
    -e 0.1 \
    -O 5 \
    --quality-cutoff 6 \
    -m 18 \
    -A AACTTGTAGATCGGA \
    -A AGGACCAAGATCGGA \
    -A ACTTGTAGATCGGAA \
    -A GGACCAAGATCGGAA \
    -A CTTGTAGATCGGAAG \
    -A GACCAAGATCGGAAG \
    -A TTGTAGATCGGAAGA \
    -A ACCAAGATCGGAAGA \
    -A TGTAGATCGGAAGAG \
    -A CCAAGATCGGAAGAG \
    -A GTAGATCGGAAGAGC \
    -A CAAGATCGGAAGAGC \
    -A TAGATCGGAAGAGCG \
    -A AAGATCGGAAGAGCG \
    -A AGATCGGAAGAGCGT \
    -A GATCGGAAGAGCGTC \
    -A ATCGGAAGAGCGTCG \
    -A TCGGAAGAGCGTCGT \
    -A CGGAAGAGCGTCGTG \
    -A GGAAGAGCGTCGTGT \
    -o "${OUTPUT_R1}" \
    -p "${OUTPUT_R2}" \
    "${INPUT_R1}" \
    "${INPUT_R2}" > "${METRICS_FILE}"

View on GitHub

Takes output from cutadapt round 2.

cutadapt v2.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

    # Install cutadapt if not already available
    # conda install -c bioconda cutadapt

    # Define input and output files
    # INPUT_FASTQ is the output from the previous cutadapt round (round 2 in the description's context)
    INPUT_FASTQ="sample_R1_trimmed_3prime.fastq.gz"
    OUTPUT_FASTQ="sample_R1_trimmed_5prime.fastq.gz"
    REPORT_FILE="sample_R1_trimmed_5prime.cutadapt.log"

    # Define parameters for 5' adapter trimming and quality filtering
    # For eCLIP, the 5' adapter sequence can be a specific sequence or a generic N-adapter.
    # The Yeo lab eCLIP pipeline often uses a long string of Ns for 5' adapter trimming.
    # Example: NNNNNNNNNNNN (12 N's) or a specific 5' adapter sequence.
    # Using -g for 5' adapter trimming.
    FIVE_PRIME_ADAPTER="NNNNNNNNNNNN"
    ERROR_RATE=0.1
    MIN_LENGTH=18
    QUALITY_CUTOFF=20
    NUM_CORES=8 # Adjust based on available resources

    cutadapt \
        -g "${FIVE_PRIME_ADAPTER}" \
        -o "${OUTPUT_FASTQ}" \
        --error-rate "${ERROR_RATE}" \
        --minimum-length "${MIN_LENGTH}" \
        --quality-cutoff "${QUALITY_CUTOFF}" \
        --cores "${NUM_CORES}" \
        "${INPUT_FASTQ}" \
        > "${REPORT_FILE}" 2>&1

View on GitHub

Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.

bowtie2 (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install bowtie2 if not available
# conda install -c bioconda bowtie2

# --- Reference Data Setup ---
# Download or prepare a FASTA file containing human repetitive elements (e.g., RepBase sequences, rRNA, mitochondrial DNA).
# This file should represent a 'human specific version of RepBase' as described.
# As a placeholder, you might use a general human blacklist FASTA, or construct one from RepBase data.
# Example placeholder for a general human blacklist FASTA (hg38):
# wget -O human_repetitive_elements.fasta "https://raw.githubusercontent.com/ENCODE-DCC/chip-seq-pipeline2/master/references/blacklist/hg38-blacklist.v2.fasta"
# For a more comprehensive RepBase-derived reference, you would typically download RepBase data and extract human-specific elements.

# Build the bowtie2 index for the repetitive elements
# Replace 'human_repetitive_elements.fasta' with your actual reference file
bowtie2-build human_repetitive_elements.fasta human_repetitive_elements_index

# --- Filtering Reads ---
# Input FASTQ file (gzipped or unzipped). Adjust for paired-end reads if necessary.
INPUT_FASTQ="input.fastq.gz"
# Output FASTQ file containing reads that *did not* map to repetitive elements (i.e., the filtered, clean reads)
OUTPUT_FILTERED_FASTQ="filtered_reads.fastq.gz"
# Output FASTQ file containing reads that *did* map to repetitive elements (i.e., the discarded repetitive reads)
OUTPUT_REPETITIVE_FASTQ="repetitive_reads.fastq.gz"
# Bowtie2 index prefix created above
INDEX_PREFIX="human_repetitive_elements_index"

# Align reads to the repetitive elements index and keep only the unmapped reads.
# These unmapped reads are considered free of the specified repetitive elements.
# --un-gz: output unmapped reads to a gzipped file
# --al-gz: output mapped reads to a gzipped file
# -p: number of threads (adjust as needed)
# -q: input reads are FASTQ format
# -x: index prefix
# -U: single-end reads (use -1 and -2 for paired-end reads)
# --very-fast-local: a preset for speed, adjust as needed for sensitivity vs. speed
bowtie2 -p 8 -q -x "${INDEX_PREFIX}" -U "${INPUT_FASTQ}" \
    --un-gz "${OUTPUT_FILTERED_FASTQ}" \
    --al-gz "${OUTPUT_REPETITIVE_FASTQ}" \
    --very-fast-local

View on GitHub

9
Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam

STAR vInferred with models/gemini-2.5-flash GitHub
$ Bash example
```
STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within --outFilterMultimapNmax 30 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
```
View on GitHub

Takes output from STAR rmRep.

STAR v1.19 GitHub

$ Bash example

# Install samtools (if not already installed)
# conda install -c bioconda samtools=1.19

# Placeholder for reference genome and annotation (if needed for previous steps)
# GENOME_DIR="/path/to/STAR_genome_index/GRCh38"
# GTF_FILE="/path/to/gencode.vXX.annotation.gtf"

# --- Hypothetical previous step: STAR alignment and deduplication (STAR rmRep) ---
# This section illustrates how 'aligned_deduplicated.bam' (output from STAR rmRep) might be generated.
# The 'rmRep' likely refers to removing PCR duplicates after STAR alignment.
# Read1="input_R1.fastq.gz"
# Read2="input_R2.fastq.gz"
# OutputPrefix="aligned"

# STAR --genomeDir ${GENOME_DIR} \
#      --readFilesIn ${Read1} ${Read2} \
#      --runThreadN 8 \
#      --outFileNamePrefix ${OutputPrefix} \
#      --outSAMtype BAM SortedByCoordinate \
#      --outSAMunmapped None \
#      --outSAMattributes Standard

# InputBAM="${OutputPrefix}Aligned.sortedByCoord.out.bam"
# DeduplicatedBAM="aligned_deduplicated.bam"

# samtools fixmate -m ${InputBAM} ${InputBAM}.fixmate.bam
# samtools sort -o ${InputBAM}.fixmate.sorted.bam ${InputBAM}.fixmate.bam
# samtools markdup -r -s ${InputBAM}.fixmate.sorted.bam ${DeduplicatedBAM}

# --- Current step: Takes output from STAR rmRep ---
# The description only specifies the input for this step: a deduplicated BAM file.
# As no specific action is described for this step, a common and necessary subsequent action
# for a deduplicated BAM file is to index it, making it ready for downstream analysis and visualization.
INPUT_DEDUP_BAM="aligned_deduplicated.bam" # This file is the output from the 'STAR rmRep' step

samtools index "${INPUT_DEDUP_BAM}"

View on GitHub

Maps unique reads to the human genome.

STAR (Inferred with models/gemini-2.5-flash) v2.7.10a GitHub

$ Bash example

# Install STAR (e.g., using conda)
# conda install -c bioconda star

# Create a placeholder for the human genome STAR index. 
# Replace '/path/to/STAR_genome_index/human_GRCh38' with the actual path to your pre-built STAR index.
# The index can be built from a FASTA file (e.g., GRCh38 primary assembly from GENCODE or UCSC) and GTF annotation file.
# Example command to build index (run once):
# STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/STAR_genome_index/human_GRCh38 --genomeFastaFiles /path/to/human_GRCh38.fa --sjdbGTFfile /path/to/human_GRCh38.gtf --sjdbOverhang 100

# Map unique reads to the human genome
# Replace 'reads_R1.fastq.gz' and 'reads_R2.fastq.gz' with your actual input FASTQ files.
# Adjust '--runThreadN' based on available CPU cores.
# The '--outFilterMultimapNmax 1' parameter ensures only uniquely mapping reads are reported.
STAR --genomeDir /path/to/STAR_genome_index/human_GRCh38 \
     --readFilesIn reads_R1.fastq.gz reads_R2.fastq.gz \
     --runThreadN 8 \
     --outFileNamePrefix aligned_reads_ \
     --outSAMtype BAM SortedByCoordinate \
     --readFilesCommand zcat \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 10 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66

View on GitHub

Command: STAR --runMode alignReads --runThreadN 16 --genomeDir /path/to/STAR_database_file --genomeLoad LoadAndRemove --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1 /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2 --outSAMunmapped Within --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --outSAMattributes All --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterType BySJout --outReadsUnmapped Fastx --outFilterScoreMin 10 --outSAMattrRGline ID:foo --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam

STAR v2.7.x GitHub

$ Bash example

# Install STAR (example using Conda)
# conda install -c bioconda star

# Define variables
STAR_GENOME_DIR="/path/to/your/STAR_index/GRCh38" # Example: GRCh38 human genome index
READ_FILE_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1" # Input mate 1 FASTQ file
READ_FILE_2="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2" # Input mate 2 FASTQ file
OUTPUT_PREFIX="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam" # Output file prefix (including .bam for the main output)
OUTPUT_BAM="${OUTPUT_PREFIX}" # The final BAM file will be redirected to this path

# Execute STAR alignment
STAR \
  --runMode alignReads \
  --runThreadN 16 \
  --genomeDir "${STAR_GENOME_DIR}" \
  --genomeLoad LoadAndRemove \
  --readFilesIn "${READ_FILE_1}" "${READ_FILE_2}" \
  --outSAMunmapped Within \
  --outFilterMultimapNmax 1 \
  --outFilterMultimapScoreRange 1 \
  --outFileNamePrefix "${OUTPUT_PREFIX}" \
  --outSAMattributes All \
  --outStd BAM_Unsorted \
  --outSAMtype BAM Unsorted \
  --outFilterType BySJout \
  --outReadsUnmapped Fastx \
  --outFilterScoreMin 10 \
  --outSAMattrRGline ID:foo \
  --alignEndsType EndToEnd > "${OUTPUT_BAM}"

View on GitHub

takes output from STAR genome mapping.

STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star

# Define variables
# Replace with actual paths and filenames
GENOME_DIR="/path/to/STAR_index/GRCh38" # Placeholder for STAR genome index directory (e.g., GRCh38)
READ1_FASTQ="input_R1.fastq.gz" # Placeholder for input Read 1 FASTQ file
READ2_FASTQ="input_R2.fastq.gz" # Placeholder for input Read 2 FASTQ file (remove if single-end)
OUTPUT_PREFIX="sample_name" # Prefix for output files
THREADS=8 # Number of threads to use

# Run STAR genome mapping
# This command aligns RNA-seq reads (e.g., from eCLIP) to a reference genome.
# It outputs a sorted BAM file, filters for uniquely mapping reads, and allows for a few mismatches.
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
     --readFilesCommand zcat \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --runThreadN "${THREADS}"

View on GitHub

Custom random-mer-aware script for PCR duplicate removal.

dedup_umi.py from yeolab/eclip workflow (Inferred with models/gemini-2.5-flash) vPython script within yeolab/eclip workflow

$ Bash example

# Clone the eCLIP workflow repository to get the script
# git clone https://github.com/yeolab/eclip.git
# cd eclip/tools

# Install dependencies (e.g., pysam)
# pip install pysam

# Define input and output file paths
INPUT_BAM="aligned_reads_with_umis.bam" # Placeholder for your aligned BAM file with UMIs in read names
OUTPUT_DEDUP_BAM="deduplicated_reads.bam"

# Execute the custom random-mer-aware script for PCR duplicate removal
# This script expects UMIs to be in the read names (e.g., @read_id:UMI)
python dedup_umi.py -i "${INPUT_BAM}" -o "${OUTPUT_DEDUP_BAM}"

Command: barcode_collapse_pe.py --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics

barcode_collapse_pe.py vNot explicitly versioned, part of yeolab/eclip pipeline GitHub

$ Bash example

# Install Miniconda or Anaconda if not already installed
# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# bash miniconda.sh -b -p $HOME/miniconda
# rm miniconda.sh
# export PATH="$HOME/miniconda/bin:$PATH"

# Create a conda environment for the eCLIP pipeline dependencies
# conda create -n eclip_env python=3.8 pysam numpy pandas -y
# conda activate eclip_env

# Clone the eCLIP pipeline repository to get the script
# git clone https://github.com/yeolab/eclip.git
# SCRIPT_PATH="eclip/scripts/barcode_collapse_pe.py"

# Define input and output file paths
INPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam"
OUTPUT_BAM="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam"
METRICS_FILE="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics"

# Execute the command (assuming SCRIPT_PATH is defined after cloning the repo)
python "${SCRIPT_PATH}" \
    --bam "${INPUT_BAM}" \
    --out_file "${OUTPUT_BAM}" \
    --metrics_file "${METRICS_FILE}"

View on GitHub

Takes output from barcode collapse PE.

cutadapt (Inferred with models/gemini-2.5-flash) v4.0 GitHub

$ Bash example

# Install cutadapt (example using conda)
# conda install -c bioconda cutadapt=4.0

# Define input and output files (assuming paired-end reads from barcode collapse)
INPUT_R1="collapsed_reads_R1.fastq.gz"
INPUT_R2="collapsed_reads_R2.fastq.gz"
OUTPUT_R1="trimmed_reads_R1.fastq.gz"
OUTPUT_R2="trimmed_reads_R2.fastq.gz"
REPORT="cutadapt_report.txt"

# Define common eCLIP adapter sequences (from yeolab/skipper workflow)
# -a: 3' adapter for R1 reads
# -A: 3' adapter for R2 reads
ADAPTER_3_PRIME_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
ADAPTER_3_PRIME_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

# Execute cutadapt for paired-end adapter trimming and quality filtering
# --minimum-length: Discard reads shorter than 18 bp after trimming
# --quality-cutoff: Trim low-quality bases from the 3' end using a quality score cutoff of 20
cutadapt \
  -a "${ADAPTER_3_PRIME_R1}" \
  -A "${ADAPTER_3_PRIME_R2}" \
  --minimum-length 18 \
  --quality-cutoff 20 \
  -o "${OUTPUT_R1}" \
  -p "${OUTPUT_R2}" \
  "${INPUT_R1}" \
  "${INPUT_R2}" \
  > "${REPORT}" 2>&1

View on GitHub

Sorts resulting bam file for use downstream.

samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Sort the BAM file. Replace 'input.bam' with your actual input file and 'output.bam' with your desired output file name.
samtools sort -o output.bam input.bam

# Index the sorted BAM file for downstream use (e.g., visualization, variant calling)
samtools index output.bam

View on GitHub

Command: java -Xmx2048m -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Djava.io.tmpdir=/full/path/to/files/.queue/tmp -cp /path/to/gatk/dist/Queue.jar net.sf.picard.sam.SortSam INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam TMP_DIR=/full/path/to/files/.queue/tmp OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam VALIDATION_STRINGENCY=SILENT SO=coordinate CREATE_INDEX=true

Picard vInferred with models/gemini-2.5-flash GitHub

$ Bash example

# Install Picard (often bundled with GATK or available standalone)
# conda create -n picard_env picard -c bioconda -c conda-forge
# conda activate picard_env

# Define variables
# The command uses Queue.jar from GATK's distribution. This JAR is typically part of GATK 3.x
# and contains the necessary Picard classes or acts as an entry point for them.
PICARD_JAR="/path/to/gatk/dist/Queue.jar" # Adjust this path as needed
DATA_DIR="/full/path/to/files" # Adjust this path as needed
INPUT_BAM="${DATA_DIR}/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam"
OUTPUT_BAM="${DATA_DIR}/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
TMP_DIR="${DATA_DIR}/.queue/tmp"

# Create temporary directory if it doesn't exist
mkdir -p "${TMP_DIR}"

# Execute Picard SortSam
java -Xmx2048m \
     -XX:+UseParallelOldGC \
     -XX:ParallelGCThreads=4 \
     -XX:GCTimeLimit=50 \
     -XX:GCHeapFreeLimit=10 \
     -Djava.io.tmpdir="${TMP_DIR}" \
     -cp "${PICARD_JAR}" \
     net.sf.picard.sam.SortSam \
     INPUT="${INPUT_BAM}" \
     TMP_DIR="${TMP_DIR}" \
     OUTPUT="${OUTPUT_BAM}" \
     VALIDATION_STRINGENCY=SILENT \
     SO=coordinate \
     CREATE_INDEX=true

View on GitHub

Takes output from sortSam, makes bam index for use downstream.

samtools (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools=1.19

# Assume the sorted BAM file from sortSam is named 'input_sorted.bam'
# This command creates an index file (e.g., 'input_sorted.bam.bai') in the same directory.
samtools index input_sorted.bam

View on GitHub

Command: samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai

samtools v1.19 GitHub

$ Bash example

# Install samtools (if not already installed)
# conda install -c bioconda samtools

samtools index /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai

View on GitHub

Takes inputs from multiple final bam files.

samtools merge (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools if not already available
# conda install -c bioconda samtools

# Example: Merge multiple BAM files for a single sample (e.g., technical replicates or lanes)
# This command takes multiple input BAM files and merges them into a single output BAM file.
# Replace input_file_1.bam, input_file_2.bam, etc., with your actual BAM file paths.
# Replace combined_output.bam with your desired output file name.
samtools merge -o combined_output.bam input_file_1.bam input_file_2.bam input_file_3.bam

View on GitHub

Merges the two technical replicates for further downstream analysis.

samtools (Inferred with models/gemini-2.5-flash) v1.18 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Define input and output file names (example placeholders)
INPUT_REPLICATE_1="sample_replicate1.bam"
INPUT_REPLICATE_2="sample_replicate2.bam"
OUTPUT_MERGED_BAM="sample_merged_replicates.bam"

# Merge the two technical replicates BAM files
samtools merge "${OUTPUT_MERGED_BAM}" "${INPUT_REPLICATE_1}" "${INPUT_REPLICATE_2}"

View on GitHub

Command: samtools merge /full/path/to/files/CombinedID.merged.bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam

samtools v1.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Define input and output files
OUTPUT_BAM="/full/path/to/files/CombinedID.merged.bam"
INPUT_BAM_1="/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"
INPUT_BAM_2="/full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam"

# Merge multiple sorted BAM files into a single sorted BAM file
samtools merge "${OUTPUT_BAM}" "${INPUT_BAM_1}" "${INPUT_BAM_2}"

View on GitHub

Takes output from sortSam, makes bam index for use downstream.

samtools index (Inferred with models/gemini-2.5-flash) v1.19 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools=1.19

# Assume the output from sortSam is a sorted BAM file named 'input.sorted.bam'
# This command creates an index file 'input.sorted.bam.bai' in the same directory.
samtools index input.sorted.bam

View on GitHub

Command: samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai

samtools v1.x GitHub

$ Bash example

# Install samtools (if not already installed)
# conda install -c bioconda samtools

samtools index /full/path/to/files/CombinedID.merged.bam /full/path/to/files/CombinedID.merged.bam.bai

View on GitHub

Takes output from sortSam.

samtools (Inferred with models/gemini-2.5-flash) v1.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# This command takes a sorted BAM file (output from sortSam) and creates an index file (.bai).
# The index file is essential for quickly accessing regions of the BAM file, 
# which is required by many downstream tools (e.g., genome browsers, variant callers).
# Example: Assuming 'sorted.bam' is the output from sortSam
samtools index sorted.bam

View on GitHub

Only outputs the second read in each pair for use with single stranded peak caller.

samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools (example using conda)
# conda install -c bioconda samtools=1.19

# Extract second reads in pair from an aligned BAM file and convert to FASTQ
# The 0x80 flag in samtools view selects reads that are the 'second in pair'.
# Replace input.bam with your aligned BAM file.
# Replace output_R2.fastq with your desired output FASTQ file name.
samtools view -f 0x80 -b input.bam | samtools fastq - > output_R2.fastq

View on GitHub

This is the final bam file to perform analysis on.

samtools (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools if not already available
# conda install -c bioconda samtools

# Sort the BAM file by coordinate. This is a common step to prepare a "final" BAM for analysis.
# Replace 'input.bam' with the actual aligned BAM file name.
samtools sort -o final.bam input.bam

# Index the sorted BAM file. An index (.bai) file is crucial for many downstream tools and visualization.
samtools index final.bam

View on GitHub

Command: samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam

samtools v1.9 GitHub

$ Bash example

# Install samtools (e.g., using conda)
# conda install -c bioconda samtools=1.9

# Extract reads that are the second in a pair (R2) from a merged BAM file
# -h: Include header in the output
# -b: Output in BAM format
# -f 128: Select reads where the FLAG has the 0x80 bit set (read is the second in a pair)
samtools view -hb -f 128 /full/path/to/files/CombinedID.merged.bam > /full/path/to/files/CombinedID.merged.r2.bam

View on GitHub

Takes results from samtools view.

samtools v1.10 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install samtools (e.g., via conda)
# conda install -c bioconda samtools=1.10

# Example: Convert a sorted BAM file to SAM format
# This command takes an input BAM file and outputs its content in SAM format to standard output.
# The -h flag includes the header.
# Replace 'input.bam' with your actual input file.
# The output can be redirected to a file (e.g., > output.sam) or piped to another command.
samtools view -h input.bam > output.sam

View on GitHub

Calls peaks on those files.

clipper (Inferred with models/gemini-2.5-flash) vlatest GitHub

$ Bash example

bash
# Install Miniconda if not already installed
# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
# export PATH="$HOME/miniconda/bin:$PATH"
# conda init bash
# source ~/.bashrc

# Create a conda environment for clipper and its dependencies
# conda create -n clipper_env python=3.8 numpy scipy pysam -y
# conda activate clipper_env

# Clone the clipper repository
# git clone https://github.com/yeolab/clipper.git
# cd clipper

# Define input files and parameters (placeholders - replace with actual paths/values)
IP_BAM="path/to/your/ip_sample.bam"
CONTROL_BAM="path/to/your/control_sample.bam" # e.g., SMInput or IgG
GENOME_SIZE="hg38" # Placeholder: use 'hg38' for human, 'mm10' for mouse, or a numerical value (e.g., 3.1e9 for human)
OUTPUT_PREFIX="eclip_peaks"
SPECIES="human" # Placeholder: 'human', 'mouse', etc.
FDR_THRESHOLD=0.05
LOGFC_THRESHOLD=1.0

# Execute clipper for differential peak calling (typical for eCLIP)
python clipper.py \
    -b "${IP_BAM}" \
    -c "${CONTROL_BAM}" \
    -s "${GENOME_SIZE}" \
    -o "${OUTPUT_PREFIX}" \
    --species "${SPECIES}" \
    --threshold-fdr "${FDR_THRESHOLD}" \
    --threshold-logfc "${LOGFC_THRESHOLD}" \
    --verbose

# Output files will be generated in the current directory, e.g., eclip_peaks.bed, eclip_peaks.narrowPeak

View on GitHub

Command: clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle

CLIPper v0.0.1 (Inferred from setup.py) GitHub

$ Bash example

# Install CLIPper (example using conda)
# conda create -n clipper_env python=3.8
# conda activate clipper_env
# pip install git+https://github.com/yeolab/clipper.git

# Execute CLIPper command
clipper -b /full/path/to/files/CombinedID.merged.r2.bam -s hg19 -o /full/path/to/files/CombinedID.merged.r2.peaks.bed --bonferroni --superlocal --threshold-method binomial --save-pickle

View on GitHub

Tools Used

STAR

Raw Source Text

Takes output from raw files.  Run to trim off both 5â and 3â adapters on both reads. Command: quality-cutoff 6  -m 18  -a NNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC  -g CTTCCGATCTACAAGTT -g CTTCCGATCTTGGTCCT  -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGT  AGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT  -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz  -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz  /full/path/to/files/file_R1.C01.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.metrics
Takes output from cutadapt round 1. Run to trim off the 3â adapters on read 2, to control for double ligation events. Command: cutadapt -f fastq --match-read-wildcards  --times 1  -e 0.1  -O 5  --quality-cutoff 6  -m 18  -A AACTTGTAGATCGGA -A AGGACCAAGATCGGA -A ACTTGTAGATCGGAA -A GGACCAAGATCGGAA -A CTTGTAGATCGGAAG -A GACCAAGATCGGAAG -A TTGTAGATCGGAAGA -A ACCAAGATCGGAAGA -A TGTAGATCGGAAGAG -A CCAAGATCGGAAGAG -A GTAGATCGGAAGAGC -A CAAGATCGGAAGAGC -A TAGATCGGAAGAGCG -A AAGATCGGAAGAGCG -A AGATCGGAAGAGCGT -A GATCGGAAGAGCGTC -A ATCGGAAGAGCGTCG -A TCGGAAGAGCGTCGT -A CGGAAGAGCGTCGTG -A GGAAGAGCGTCGTGT  -o /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz  -p /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.fastq.gz  /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.fastq.gz > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.metrics
Takes output from cutadapt round 2.  Maps to human specific version of RepBase used to remove repetitive elements, helps control for spurious artifacts from rRNA (& other) repetitive reads.  Command: STAR  --runMode alignReads  --runThreadN 16  --genomeDir /path/to/RepBase_human_database_file --genomeLoad LoadAndRemove  --readFilesIn /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.fastq.gz /full/path/to/files/file_R2.C01.fastq.gz.adapterTrim.round2.fastq.gz --outSAMunmapped Within  --outFilterMultimapNmax 30  --outFilterMultimapScoreRange 1  --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam --outSAMattributes All  --readFilesCommand zcat  --outStd BAM_Unsorted  --outSAMtype BAM Unsorted  --outFilterType BySJout  --outReadsUnmapped Fastx  --outFilterScoreMin 10  --outSAMattrRGline ID:foo  --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bam
Takes output from STAR rmRep.  Maps unique reads to the human genome.  Command: STAR  --runMode alignReads  --runThreadN 16  --genomeDir  /path/to/STAR_database_file --genomeLoad LoadAndRemove  --readFilesIn  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate1  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rep.bamUnmapped.out.mate2  --outSAMunmapped Within  --outFilterMultimapNmax 1  --outFilterMultimapScoreRange 1  --outFileNamePrefix /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam  --outSAMattributes All  --outStd BAM_Unsorted  --outSAMtype BAM Unsorted  --outFilterType BySJout  --outReadsUnmapped Fastx  --outFilterScoreMin 10  --outSAMattrRGline ID:foo  --alignEndsType EndToEnd > /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam
takes output from STAR genome mapping.  Custom random-mer-aware script for PCR duplicate removal. Command: barcode_collapse_pe.py  --bam /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.bam  --out_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam  --metrics_file /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.metrics
Takes output from barcode collapse PE.  Sorts resulting bam file for use downstream.  Command: java  -Xmx2048m  -XX:+UseParallelOldGC  -XX:ParallelGCThreads=4  -XX:GCTimeLimit=50  -XX:GCHeapFreeLimit=10  -Djava.io.tmpdir=/full/path/to/files/.queue/tmp  -cp /path/to/gatk/dist/Queue.jar  net.sf.picard.sam.SortSam  INPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.bam  TMP_DIR=/full/path/to/files/.queue/tmp  OUTPUT=/full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam  VALIDATION_STRINGENCY=SILENT  SO=coordinate  CREATE_INDEX=true
Takes output from sortSam, makes bam index for use downstream.  Command: samtools index  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam.bai
Takes inputs from multiple final bam files.  Merges the two technical replicates for further downstream analysis.  Command: samtools  merge  /full/path/to/files/CombinedID.merged.bam  /full/path/to/files/file_R1.C01.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam /full/path/to/files/file_R1.D08.fastq.gz.adapterTrim.round2.rmRep.rmDup.sorted.bam
Takes output from sortSam, makes bam index for use downstream.  Command: samtools  index  /full/path/to/files/CombinedID.merged.bam  /full/path/to/files/CombinedID.merged.bam.bai
Takes output from sortSam.  Only outputs the second read in each pair for use with single stranded peak caller.  This is the final bam file to perform analysis on.  Command: samtools view -hb -f 128  /full/path/to/files/CombinedID.merged.bam  >  /full/path/to/files/CombinedID.merged.r2.bam
Takes results from samtools view.  Calls peaks on those files.  Command: clipper  -b /full/path/to/files/CombinedID.merged.r2.bam  -s hg19  -o /full/path/to/files/CombinedID.merged.r2.peaks.bed  --bonferroni  --superlocal  --threshold-method binomial  --save-pickle
Genome_build: hg19
Supplementary_files_format_and_content: bigWig, bigBed, bed (col1: chrom, col2: chromStart, col3: chromEnd, col4: -log10 pvalue, col5: log2 fold enrichment above input, col6: strand) format, contains clusters of predicted RBP binding

← Back to Analysis