GSE86041 Processing Pipeline

RIP-Seq code_examples 7 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [iCLIP-seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.

cutadapt v2.10 GitHub

$ Bash example

# Install cutadapt (e.g., using conda)
# conda install -c bioconda cutadapt=2.10

cutadapt \
  --match-read-wildcards \
  --times 2 \
  -e 0 \
  -O 5 \
  --quality-cutoff 6 \
  -m 18 \
  -b TCGTATGCCGTCTTCTGCTTG \
  -b ATCTCGTATGCCGTCTTCTGCTTG \
  -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC \
  -b TGGAATTCTCGGGTGCCAAGG \
  -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA \
  -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT \
  -o trimmed_reads.fastq.gz \
  raw_reads.fastq.gz

View on GitHub

Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al.

Bowtie v1.0.0 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie

# Define input and output files
# Replace 'trimmed_reads.fastq' with your actual trimmed reads file
TRIMMED_READS="trimmed_reads.fastq"
# Replace 'repbase_18.05' with the path to your Bowtie index for repetitive elements
# This index should be built from the RepBase (version 18.05) repetitive elements database
BOWTIE_INDEX="repbase_18.05"
OUTPUT_SAM="mapped_to_repbase.sam"

# Run Bowtie mapping
# Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05)
# using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20
bowtie -S -q -p 16 -e 100 -l 20 "${BOWTIE_INDEX}" "${TRIMMED_READS}" > "${OUTPUT_SAM}"

View on GitHub

3

2009).

N/A (Inferred with models/gemini-2.5-flash) vN/A (Inferred with models/gemini-2.5-flash)

Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1 (Dobin et al.

STAR v2.3.03 GitHub

$ Bash example

# Install STAR if not already installed
# conda install -c bioconda star=2.3.03

# Placeholder for STAR genome index directory for mm9 (UCSC assembly)
# You would need to download or build the mm9 STAR index first.
# Example command to build index (replace paths and threads):
# STAR --runMode genomeGenerate --genomeDir /path/to/mm9_star_index --genomeFastaFiles /path/to/mm9.fa --sjdbGTFfile /path/to/mm9.gtf --runThreadN <num_threads>
GENOME_DIR="/path/to/mm9_star_index" # Replace with actual path to mm9 STAR index

# Placeholder for input reads file (FASTQ format, pre-filtered for repetitive elements)
INPUT_READS="input_reads.fastq" # Replace with your actual input FASTQ file

# Placeholder for output prefix
OUTPUT_PREFIX="mapped_reads"

STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${INPUT_READS}" \
     --outSAMunmapped Within \
     --outFilterMultimapNmax 1 \
     --outFilterMultimapScoreRange 1 \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outSAMtype BAM SortedByCoordinate \
     --runThreadN 8 # Example: Adjust number of threads as needed

View on GitHub

2013).

(Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# No specific tool or command could be inferred from the description '2013)'.
# Please provide more context to generate a relevant bash command.

# Placeholder for a generic reference genome (e.g., human hg38)
# This would typically be used by an alignment or peak calling tool.
# GENOME_FASTA="/path/to/your/genome/hg38.fa"
# GENOME_INDEX_PREFIX="/path/to/your/genome/index/hg38" # For aligners like STAR, HISAT2

# Placeholder for input and output files
# INPUT_FASTQ="sample.fastq.gz"
# OUTPUT_BAM="aligned.bam"
# OUTPUT_PEAKS="peaks.bed"

# Example of a generic command structure (replace with actual tool and parameters)
# For alignment:
# STAR --runThreadN 8 --genomeDir "${GENOME_INDEX_PREFIX}" --readFilesIn "${INPUT_FASTQ}" --outFileNamePrefix "${OUTPUT_BAM%.bam}"
# For peak calling (e.g., CLIPper, MACS2):
# clipper -i "${OUTPUT_BAM}" -o "${OUTPUT_PEAKS}" -s hg38

echo "Placeholder: No specific bioinformatics command inferred due to insufficient description."

Reads having the same 5â mapping position were collapsed to a single read to eliminate PCR duplication.

samtools markdup (Inferred with models/gemini-2.5-flash) v1.19 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Input BAM file (assumed to be sorted by coordinate)
INPUT_BAM="aligned_reads.bam"
OUTPUT_BAM="deduplicated_reads.bam"

# Collapse reads having the same 5' mapping position to a single read to eliminate PCR duplication.
# The -r option removes duplicate reads instead of just marking them.
samtools markdup -r "${INPUT_BAM}" "${OUTPUT_BAM}"

View on GitHub

CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010).

CLIP-seq vlatest GitHub

$ Bash example

# Install clipper (if not already installed)
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# python setup.py install # Or just use the script directly

# Placeholder variables - User should replace these with actual file paths
# For human (hg38) genome, you can download .fa and .gtf from UCSC or Ensembl.
IP_BAM="path/to/your/ip.bam"
CONTROL_BAM="path/to/your/control.bam" # Optional, but highly recommended for CLIP-seq
GENOME_FASTA="path/to/your/hg38.fa" 
GENOME_ANNOTATION="path/to/your/hg38.gtf" 
OUTPUT_DIR="clipper_peaks"
P_VALUE=0.01
FOLD_ENRICHMENT=2
STRAND="." # Use '.' for unstranded, '+' for forward, '-' for reverse
THREADS=8 # Number of CPU threads to use

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Execute clipper
python /path/to/clipper/clipper.py \
    -o "${OUTPUT_DIR}" \
    -p "${P_VALUE}" \
    -f "${FOLD_ENRICHMENT}" \
    -s "${STRAND}" \
    -g "${GENOME_FASTA}" \
    -a "${GENOME_ANNOTATION}" \
    -c "${CONTROL_BAM}" \
    -t "${THREADS}" \
    "${IP_BAM}"

View on GitHub

Tools Used

STAR CLIP-seq

Raw Source Text

Raw CLIP-seq reads were trimmed of polyA tails, adapters and low quality ends using Cutadapt with parameters --match-read-wildcards --times 2 -e 0 -O 5 --quality-cutoff' 6 -m 18 -b TCGTATGCCGTCTTCTGCTTG -b ATCTCGTATGCCGTCTTCTGCTTG -b CGACAGGTTCAGAGTTCTACAGTCCGACGATC -b TGGAATTCTCGGGTGCCAAGG -b AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -b TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT. Trimmed reads were mapped against a database of repetitive elements derived from RepBase (version 18.05) using Bowtie (version 1.0.0) with parameters -S -q -p 16 -e 100 -l 20 (Langmead et al. 2009). Reads not mapped to repetitive elements were mapped to the mm9 mouse genome (UCSC assembly) using STAR (version 2.3.03) with parameters --outSAMunmapped Within âoutFilterMultimapNmax 1 âoutFilterMultimapScoreRange 1 (Dobin et al. 2013). Reads having the same 5â mapping position were collapsed to a single read to eliminate PCR duplication. CLIP-seq peaks were identified as previously described (Zisoulis et al, NSMB 2010).
Genome_build: mm9
Supplementary_files_format_and_content: peaks.bed and bigwig

← Back to Analysis