GSE86040 Processing Pipeline

RIP-Seq code_examples 6 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_eCLIP_h…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Reads were demultiplexed using custom scripts and the randomer was appended to the read name.

Custom Script vN/A (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# This example assumes a custom Python script 'eclip_demultiplex_and_randomer_append.py'
# that performs both demultiplexing based on barcodes and appends a randomer to the read name.
# A common randomer length in eCLIP is 6 base pairs.

# Installation (example for a Python script with Biopython dependency):
# conda create -n eclip_demux python=3.8
# conda activate eclip_demux
# pip install biopython

# Create a dummy barcode file (replace with actual barcodes and sample names)
# echo -e "AAAA\tsample1\nTTTT\tsample2" > barcodes.tsv

# Execute the custom script
python eclip_demultiplex_and_randomer_append.py \
  --input_fastq "raw_reads.fastq.gz" \
  --barcode_file "barcodes.tsv" \
  --randomer_length 6 \
  --output_dir "demultiplexed_reads"

View on GitHub

Reads were trimmed, filtered for repetitive elements, and mapped to human genome assembly hg19 as described in iCLIP computational analysis.

iCLIP v4.0, 2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Input FASTQ file
READS="input.fastq.gz"
OUTPUT_PREFIX="sample"

# Reference paths (placeholders - replace with actual paths)
# Download hg19 genome and annotation files from UCSC or Ensembl.
# Build STAR index for hg19:
# STAR --runThreadN <threads> --runMode genomeGenerate --genomeDir /path/to/hg19_STAR_index --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf
HG19_STAR_INDEX="/path/to/hg19_STAR_index"

# Create a repeatome FASTA file (e.g., combining rRNA, tRNA, snRNA, snoRNA sequences)
# and build STAR index for it.
# Example repeatome FASTA: hg19_rRNA_tRNA_snRNA_snoRNA_ensembl_ERCC.fa
# STAR --runThreadN <threads> --runMode genomeGenerate --genomeDir /path/to/repeatome_STAR_index --genomeFastaFiles /path/to/repeatome.fa
REPEATOME_STAR_INDEX="/path/to/repeatome_STAR_index"

# 1. Trimming with cutadapt
# Common 3' adapter for iCLIP/eCLIP. Adjust adapter sequence as needed.
# -a: 3' adapter sequence
# -q 20: Trim low-quality ends (Phred score < 20)
# -m 15: Discard reads shorter than 15 bp after trimming
# conda install -c bioconda cutadapt
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20 -m 15 -o "${OUTPUT_PREFIX}_trimmed.fastq.gz" "$READS"

# 2. Filtering for repetitive elements (e.g., rRNA, tRNA, snRNA, snoRNA) using STAR
# Align reads to a repeatome index and keep unmapped reads.
# --outReadsUnmapped Fastx: Output unmapped reads to a FASTQ file.
# conda install -c bioconda star
STAR --runThreadN 8 \
     --genomeDir "$REPEATOME_STAR_INDEX" \
     --readFilesIn "${OUTPUT_PREFIX}_trimmed.fastq.gz" \
     --outFileNamePrefix "${OUTPUT_PREFIX}_repeatome_" \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0 \
     --outFilterMatchNminOverLread 0 \
     --outFilterType BySJout \
     --outSAMattributes All \
     --outSAMtype BAM Unsorted \
     --outReadsUnmapped Fastx \
     --outStd Log \
     --readFilesCommand zcat

# The unmapped reads from the repeatome alignment are the filtered reads.
mv "${OUTPUT_PREFIX}_repeatome_Unmapped.out.mate1" "${OUTPUT_PREFIX}_filtered_for_repeats.fastq.gz"

# 3. Mapping to human genome assembly hg19 with STAR
# --outSAMtype BAM SortedByCoordinate: Output sorted BAM file.
STAR --runThreadN 8 \
     --genomeDir "$HG19_STAR_INDEX" \
     --readFilesIn "${OUTPUT_PREFIX}_filtered_for_repeats.fastq.gz" \
     --outFileNamePrefix "${OUTPUT_PREFIX}_hg19_" \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0 \
     --outFilterMatchNminOverLread 0 \
     --outFilterType BySJout \
     --outSAMattributes All \
     --outSAMtype BAM SortedByCoordinate \
     --outReadsUnmapped Fastx \
     --outStd Log \
     --readFilesCommand zcat

View on GitHub

PCR duplicated reads were removed based on the start positions of read1, read2, and the sequence of the randomer. eCLIP peaks were identified using CLIPPER with parameters âs hg19 âo âbonferroni âsuperlocal --threshold-method binomial --save-pickle (Lovci et al.

CLIPper v(Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install clipper (example, adjust as needed)
# It's recommended to install clipper in a dedicated conda environment or via pip.
# For example:
# conda create -n clipper_env python=3.8
# conda activate clipper_env
# pip install git+https://github.com/yeolab/clipper.git

# Define input BAM files (these files are assumed to be PCR deduplicated as per the description)
# Replace with actual paths to your CLIP-seq and control BAM files.
# Example placeholders:
CLIP_BAM_FILES="clip_sample1.dedup.bam clip_sample2.dedup.bam"
CONTROL_BAM_FILES="control_sample1.dedup.bam control_sample2.dedup.bam"
OUTPUT_DIR="clipper_peaks_hg19"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run CLIPPER peak calling
# Parameters are directly from the description: -s hg19 -o --bonferroni --superlocal --threshold-method binomial --save-pickle
clipper.py \
  -s hg19 \
  -o "${OUTPUT_DIR}" \
  --bonferroni \
  --superlocal \
  --threshold-method binomial \
  --save-pickle \
  ${CLIP_BAM_FILES} \
  -c ${CONTROL_BAM_FILES}

View on GitHub

NSMB, 2013).

Unknown Tool (Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# The step description "NSMB, 2013)" refers to a publication and does not specify a bioinformatics tool,
# parameters, or the type of assay being performed.
# Therefore, a specific bash command cannot be generated.
#
# To generate a command, more context is needed, such as:
# - The specific bioinformatics task (e.g., alignment, peak calling, variant calling).
# - The name of the tool to be used.
# - Input file types and desired output.
# - Reference genome or other reference datasets.

View on GitHub

Peak strength was then normalized against a size matched input by calculating fold enrichment of number of reads in IP versus number of reads in size matched input.

normalize_bedgraph.py (Inferred with models/gemini-2.5-flash) vFrom yeolab/eclip workflow (CWL version, before 2021) GitHub

$ Bash example

# Assuming ip_signal.bedgraph and input_signal.bedgraph are generated from previous steps
# (e.g., from bedtools genomecov or similar signal generation tools, potentially after library size normalization).

# Clone the eclip repository to access the script if not already available.
# git clone https://github.com/yeolab/eclip.git
# cd eclip

# Execute the normalization script to calculate fold enrichment.
# -i: Input IP bedGraph file containing read counts or signal.
# -c: Input control (size-matched input) bedGraph file containing read counts or signal.
# -o: Output bedGraph file for fold enrichment.
# --method fold_enrichment: Specifies the normalization method to calculate fold enrichment.
python scripts/normalize_bedgraph.py \
    -i ip_signal.bedgraph \
    -c input_signal.bedgraph \
    -o ip_fold_enrichment.bedgraph \
    --method fold_enrichment

View on GitHub

Peaks were called significant if the number of reads in IP was greater than the number of reads in input and the the peaks a Bonferroni corrected fisher exact p-value of less than .05.

clipper (Inferred with models/gemini-2.5-flash) vNot specified GitHub

$ Bash example

# Install clipper (if not already installed)
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# # Ensure Python dependencies are met, e.g., pysam, numpy
# # pip install -r requirements.txt

# Example usage of clipper for peak calling
# Replace IP.bam, INPUT.bam, and hg38.chrom.sizes with actual file paths.
# The description implies a Bonferroni corrected Fisher exact p-value < 0.05
# and enrichment of IP over input, which is inherently handled by clipper's statistical test.
python clipper.py \
    -b IP.bam \
    -c INPUT.bam \
    -s hg38.chrom.sizes \
    -o significant_peaks.bed \
    --bonferroni \
    --p_value 0.05

View on GitHub

Tools Used

iCLIP

Raw Source Text

Reads were demultiplexed using custom scripts and the randomer was appended to the read name. Reads were trimmed, filtered for repetitive elements, and mapped to human genome assembly hg19 as described in iCLIP computational analysis. PCR duplicated reads were removed based on the start positions of read1, read2, and the sequence of the randomer. eCLIP peaks were identified using CLIPPER with parameters  âs hg19 âo âbonferroni âsuperlocal --threshold-method binomial --save-pickle (Lovci et al. NSMB, 2013). Peak strength was then normalized against a size matched input by calculating fold enrichment of number of reads in IP versus number of reads in size matched input.  Peaks were called significant if the number of reads in IP was greater than the number of reads in input and the the peaks a Bonferroni corrected fisher exact p-value of less than .05.
Genome_build: hg19
Supplementary_files_format_and_content: peaks.bed and bigwig

← Back to Analysis