GSE34993 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins.

Cell reports (2012) — PMID 22574288

Dataset

Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins (CLIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

CLIP-seq reads were processed as previously described (Polymenidou et al., 2011).

CLIP-seq v0.12.7 GitHub

$ Bash example

# Define variables
GENOME_FASTA="mm9.fa"
GENOME_INDEX_PREFIX="mm9_index"
INPUT_FASTQ="clip_seq_reads.fastq" # Placeholder for input CLIP-seq reads
OUTPUT_SAM="aligned_reads.sam"
OUTPUT_BAM="aligned_reads.bam"
OUTPUT_PEAKS="peaks.bed" # Placeholder for peak output

# --- Installation (commented out) ---
# Install Bowtie (version 0.12.7)
# For example, using conda:
# conda create -n bowtie_env bowtie=0.12.7
# conda activate bowtie_env

# Install samtools (for converting SAM to BAM and sorting)
# conda install -c bioconda samtools

# --- Reference Genome Preparation (commented out) ---
# Download the mouse reference genome (mm9) from UCSC
# wget -nc http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
# gunzip -f "${GENOME_FASTA}.gz"

# Build Bowtie index for the mm9 genome
# bowtie-build "${GENOME_FASTA}" "${GENOME_INDEX_PREFIX}"

# --- CLIP-seq Read Processing ---

# Step 1: Align CLIP-seq reads to the mouse genome (mm9) using Bowtie.
# As described in Polymenidou et al., 2011:
# -v 2: Allow up to two mismatches.
# -m 1: Discard reads that map to more than one location (multi-mapping reads).
# --best --strata: Report alignments that are "best" in terms of mismatches.
bowtie -v 2 -m 1 --best --strata "${GENOME_INDEX_PREFIX}" "${INPUT_FASTQ}" "${OUTPUT_SAM}"

# Step 2: Convert SAM to BAM and sort the aligned reads.
# This is a standard post-alignment step for downstream analysis.
samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "${OUTPUT_BAM}" -

# Step 3: Peak Calling.
# Polymenidou et al., 2011 states: "Peaks were identified using a custom script
# that identified regions with at least five overlapping reads."
# The exact custom script is not publicly available.
# Therefore, no specific bash command can be provided for this step.
# In a modern context, dedicated CLIP-seq peak callers (e.g., CLIPper, Piranha)
# would typically be used, or a custom script implementing the described criteria.
# Example of a conceptual command if the script were available:
# custom_peak_caller.sh "${OUTPUT_BAM}" 5 > "${OUTPUT_PEAKS}"

View on GitHub

Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters âq âl 20 âm 5 âk 5 ââbest).

Bowtie v0.12.2 GitHub

$ Bash example

# Install Bowtie (if not already installed)
# conda install -c bioconda bowtie=0.12.2

# Placeholder for input reads and output file
# Replace 'input_reads.fastq' with your actual trimmed reads file
# Replace 'output.sam' with your desired output alignment file name

# Ensure the hg18 index is available. If not, you would need to build it first:
# bowtie-build hg18.fa hg18

bowtie -q -l 20 -m 5 -k 5 --best hg18 input_reads.fastq > output.sam

View on GitHub

Tools Used

CLIP-seq

Raw Source Text

CLIP-seq reads were processed as previously described (Polymenidou et al., 2011). Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters âq âl 20 âm 5 âk 5 ââbest).

← Back to Analysis