GSE39872 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance.

Molecular cell (2012) — PMID 22959275

Dataset

GSE39872

LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance (HTS)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011).

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # --- Genome Index Preparation (Run once per genome, assuming hg38 as a placeholder) ---
    # This step generates the STAR genome index files. Replace paths and files as needed.
    # STAR --runMode genomeGenerate \
    #      --genomeDir /path/to/STAR_genome_index/hg38 \
    #      --genomeFastaFiles /path/to/hg38.fa \
    #      --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \
    #      --runThreadN 16
    
    # --- Read Mapping for CLIP-seq ---
    # Define variables
    GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for STAR genome index directory
    INPUT_FASTQ="input.fastq.gz" # Placeholder for input CLIP-seq FASTQ file (e.g., from a single-end experiment)
    OUTPUT_PREFIX="mapped_reads" # Prefix for output files
    THREADS=8 # Number of threads to use
    
    # Perform read mapping using STAR. Parameters are chosen to be suitable for CLIP-seq, 
    # focusing on unique mapping and minimal splicing to reflect direct RNA binding.
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${INPUT_FASTQ}" \
         --runThreadN "${THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}_" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes All \
         --outFilterMultimapNmax 1 \
         --outFilterMismatchNmax 3 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66 \
         --alignIntronMax 1 \
         --alignMatesGapMax 1000000 \
         --alignSJDBoverhangMin 1 \
         --alignSJoverhangMin 8 \
         --seedSearchStartLmax 15 \
         --seedPerReadNmax 100000 \
         --seedPerWindowNmax 100 \
         --winAnchorMultimapNmax 50 \
         --outReadsUnmapped Fastx \
         --quantMode GeneCounts # Optional: for gene quantification, often useful for CLIP-seq
  2. 2

    Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)).

    Bowtie v0.12.2 GitHub
    $ Bash example
    # Install Bowtie (version 0.12.2)
    # conda install -c bioconda bowtie=0.12.2
    
    # Download hg18 reference genome if not available
    # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg18/bigZips/hg18.fa.gz
    # gunzip hg18.fa.gz
    #
    # Build Bowtie index (if not pre-built). This will create several files with the 'hg18' prefix.
    # bowtie-build hg18.fa hg18
    
    # Align reads to the hg18 human genome using Bowtie
    # Assuming 'reads.fastq' is your input FASTQ file and 'hg18' is the prefix for your Bowtie index files.
    bowtie -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata hg18 reads.fastq > output.sam
  3. 3

    LIN28ES_CLIPseq_clusters.BED: hg18

    clipper vlatest GitHub
    $ Bash example
    # Install clipper (if not already installed)
    # pip install clipper
    # # Or clone from GitHub and install
    # # git clone https://github.com/yeolab/clipper.git
    # # cd clipper
    # # python setup.py install
    
    # Assuming input BAM file is LIN28ES_CLIPseq_aligned.bam
    # And clipper.py is in your PATH or specified with its full path
    clipper.py \
        --species hg18 \
        --threshold-method p_value \
        --threshold 0.05 \
        --output-file LIN28ES_CLIPseq_clusters.BED \
        LIN28ES_CLIPseq_aligned.bam

Tools Used

Raw Source Text
Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011). Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)).
Genome Build:
LIN28ES_CLIPseq_clusters.BED: hg18
← Back to Analysis