GSE34993 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins.

Cell reports (2012) — PMID 22574288

Dataset

GSE34993

Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins (CLIP-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    CLIP-seq reads were processed as previously described (Polymenidou et al., 2011).

    $ Bash example
    # Define variables
    GENOME_FASTA="mm9.fa"
    GENOME_INDEX_PREFIX="mm9_index"
    INPUT_FASTQ="clip_seq_reads.fastq" # Placeholder for input CLIP-seq reads
    OUTPUT_SAM="aligned_reads.sam"
    OUTPUT_BAM="aligned_reads.bam"
    OUTPUT_PEAKS="peaks.bed" # Placeholder for peak output
    
    # --- Installation (commented out) ---
    # Install Bowtie (version 0.12.7)
    # For example, using conda:
    # conda create -n bowtie_env bowtie=0.12.7
    # conda activate bowtie_env
    
    # Install samtools (for converting SAM to BAM and sorting)
    # conda install -c bioconda samtools
    
    # --- Reference Genome Preparation (commented out) ---
    # Download the mouse reference genome (mm9) from UCSC
    # wget -nc http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz
    # gunzip -f "${GENOME_FASTA}.gz"
    
    # Build Bowtie index for the mm9 genome
    # bowtie-build "${GENOME_FASTA}" "${GENOME_INDEX_PREFIX}"
    
    # --- CLIP-seq Read Processing ---
    
    # Step 1: Align CLIP-seq reads to the mouse genome (mm9) using Bowtie.
    # As described in Polymenidou et al., 2011:
    # -v 2: Allow up to two mismatches.
    # -m 1: Discard reads that map to more than one location (multi-mapping reads).
    # --best --strata: Report alignments that are "best" in terms of mismatches.
    bowtie -v 2 -m 1 --best --strata "${GENOME_INDEX_PREFIX}" "${INPUT_FASTQ}" "${OUTPUT_SAM}"
    
    # Step 2: Convert SAM to BAM and sort the aligned reads.
    # This is a standard post-alignment step for downstream analysis.
    samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "${OUTPUT_BAM}" -
    
    # Step 3: Peak Calling.
    # Polymenidou et al., 2011 states: "Peaks were identified using a custom script
    # that identified regions with at least five overlapping reads."
    # The exact custom script is not publicly available.
    # Therefore, no specific bash command can be provided for this step.
    # In a modern context, dedicated CLIP-seq peak callers (e.g., CLIPper, Piranha)
    # would typically be used, or a custom script implementing the described criteria.
    # Example of a conceptual command if the script were available:
    # custom_peak_caller.sh "${OUTPUT_BAM}" 5 > "${OUTPUT_PEAKS}"
  2. 2

    Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters –q –l 20 –m 5 –k 5 ––best).

    Bowtie v0.12.2 GitHub
    $ Bash example
    # Install Bowtie (if not already installed)
    # conda install -c bioconda bowtie=0.12.2
    
    # Placeholder for input reads and output file
    # Replace 'input_reads.fastq' with your actual trimmed reads file
    # Replace 'output.sam' with your desired output alignment file name
    
    # Ensure the hg18 index is available. If not, you would need to build it first:
    # bowtie-build hg18.fa hg18
    
    bowtie -q -l 20 -m 5 -k 5 --best hg18 input_reads.fastq > output.sam

Tools Used

Raw Source Text
CLIP-seq reads were processed as previously described (Polymenidou et al., 2011). Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters –q –l 20 –m 5 –k 5 ––best).
← Back to Analysis