GSE39872 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance.Molecular cell (2012) — PMID 22959275
Dataset
GSE39872LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance (HTS)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011).
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # --- Genome Index Preparation (Run once per genome, assuming hg38 as a placeholder) --- # This step generates the STAR genome index files. Replace paths and files as needed. # STAR --runMode genomeGenerate \ # --genomeDir /path/to/STAR_genome_index/hg38 \ # --genomeFastaFiles /path/to/hg38.fa \ # --sjdbGTFfile /path/to/gencode.v38.annotation.gtf \ # --runThreadN 16 # --- Read Mapping for CLIP-seq --- # Define variables GENOME_DIR="/path/to/STAR_genome_index/hg38" # Placeholder for STAR genome index directory INPUT_FASTQ="input.fastq.gz" # Placeholder for input CLIP-seq FASTQ file (e.g., from a single-end experiment) OUTPUT_PREFIX="mapped_reads" # Prefix for output files THREADS=8 # Number of threads to use # Perform read mapping using STAR. Parameters are chosen to be suitable for CLIP-seq, # focusing on unique mapping and minimal splicing to reflect direct RNA binding. STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${INPUT_FASTQ}" \ --runThreadN "${THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --alignIntronMax 1 \ --alignMatesGapMax 1000000 \ --alignSJDBoverhangMin 1 \ --alignSJoverhangMin 8 \ --seedSearchStartLmax 15 \ --seedPerReadNmax 100000 \ --seedPerWindowNmax 100 \ --winAnchorMultimapNmax 50 \ --outReadsUnmapped Fastx \ --quantMode GeneCounts # Optional: for gene quantification, often useful for CLIP-seq -
2
Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)).
$ Bash example
# Install Bowtie (version 0.12.2) # conda install -c bioconda bowtie=0.12.2 # Download hg18 reference genome if not available # wget https://hgdownload.soe.ucsc.edu/goldenPath/hg18/bigZips/hg18.fa.gz # gunzip hg18.fa.gz # # Build Bowtie index (if not pre-built). This will create several files with the 'hg18' prefix. # bowtie-build hg18.fa hg18 # Align reads to the hg18 human genome using Bowtie # Assuming 'reads.fastq' is your input FASTQ file and 'hg18' is the prefix for your Bowtie index files. bowtie -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata hg18 reads.fastq > output.sam
-
3
LIN28ES_CLIPseq_clusters.BED: hg18
$ Bash example
# Install clipper (if not already installed) # pip install clipper # # Or clone from GitHub and install # # git clone https://github.com/yeolab/clipper.git # # cd clipper # # python setup.py install # Assuming input BAM file is LIN28ES_CLIPseq_aligned.bam # And clipper.py is in your PATH or specified with its full path clipper.py \ --species hg18 \ --threshold-method p_value \ --threshold 0.05 \ --output-file LIN28ES_CLIPseq_clusters.BED \ LIN28ES_CLIPseq_aligned.bam
Tools Used
Raw Source Text
Read mapping from CLIP-seq experiments and data processing was performed as published (Polymenidou et al., 2011). Briefly, reads were processed and mapped to the human genome (hg18 http://genome.ucsc.edu; Bowtie version 0.12.2, with parameters -q -p 4 -e 70 -y -l 25 -n 2 -m 5 --best --strata) and assigned to 21,605 genes (as annotated previously (Yeo et al., 2009)). Genome Build: LIN28ES_CLIPseq_clusters.BED: hg18