GSE39872 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance.

Molecular cell (2012) — PMID 22959275

Dataset

LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance (HTS)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011).

CLIP-seq v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# --- Genome Index Preparation (Run once per genome, assuming hg38 as a placeholder) ---
# This step generates the STAR genome index files. Replace paths and files as needed.
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_genome_index/hg38 \
#      --genomeFastaFiles /path/to/hg38.fa \
#      --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \
#      --runThreadN 16

# --- Read Mapping for CLIP-seq ---
# Define variables
GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for STAR genome index directory
INPUT_FASTQ="input.fastq.gz" # Placeholder for input CLIP-seq FASTQ file (e.g., from a single-end experiment)
OUTPUT_PREFIX="mapped_reads" # Prefix for output files
THREADS=8 # Number of threads to use

# Perform read mapping using STAR. Parameters are chosen to be suitable for CLIP-seq, 
# focusing on unique mapping and minimal splicing to reflect direct RNA binding.
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${INPUT_FASTQ}" \
     --runThreadN "${THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes All \
     --outFilterMultimapNmax 1 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --alignIntronMax 1 \
     --alignMatesGapMax 1000000 \
     --alignSJDBoverhangMin 1 \
     --alignSJoverhangMin 8 \
     --seedSearchStartLmax 15 \
     --seedPerReadNmax 100000 \
     --seedPerWindowNmax 100 \
     --winAnchorMultimapNmax 50 \
     --outReadsUnmapped Fastx \
     --quantMode GeneCounts # Optional: for gene quantification, often useful for CLIP-seq

View on GitHub

Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)).

Bowtie v0.12.2 GitHub

$ Bash example

# Install Bowtie (version 0.12.2)
# conda install -c bioconda bowtie=0.12.2

# Download hg18 reference genome if not available
# wget https://hgdownload.soe.ucsc.edu/goldenPath/hg18/bigZips/hg18.fa.gz
# gunzip hg18.fa.gz
#
# Build Bowtie index (if not pre-built). This will create several files with the 'hg18' prefix.
# bowtie-build hg18.fa hg18

# Align reads to the hg18 human genome using Bowtie
# Assuming 'reads.fastq' is your input FASTQ file and 'hg18' is the prefix for your Bowtie index files.
bowtie -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata hg18 reads.fastq > output.sam

View on GitHub

LIN28ES_CLIPseq_clusters.BED: hg18

clipper vlatest GitHub

$ Bash example

# Install clipper (if not already installed)
# pip install clipper
# # Or clone from GitHub and install
# # git clone https://github.com/yeolab/clipper.git
# # cd clipper
# # python setup.py install

# Assuming input BAM file is LIN28ES_CLIPseq_aligned.bam
# And clipper.py is in your PATH or specified with its full path
clipper.py \
    --species hg18 \
    --threshold-method p_value \
    --threshold 0.05 \
    --output-file LIN28ES_CLIPseq_clusters.BED \
    LIN28ES_CLIPseq_aligned.bam

View on GitHub

Tools Used

CLIP-seq

Raw Source Text

Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011). Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)).
Genome Build:
LIN28ES_CLIPseq_clusters.BED: hg18

← Back to Analysis