GSE34993 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins.Cell reports (2012) — PMID 22574288
Dataset
GSE34993Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins (CLIP-Seq)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
CLIP-seq reads were processed as previously described (Polymenidou et al., 2011).
$ Bash example
# Define variables GENOME_FASTA="mm9.fa" GENOME_INDEX_PREFIX="mm9_index" INPUT_FASTQ="clip_seq_reads.fastq" # Placeholder for input CLIP-seq reads OUTPUT_SAM="aligned_reads.sam" OUTPUT_BAM="aligned_reads.bam" OUTPUT_PEAKS="peaks.bed" # Placeholder for peak output # --- Installation (commented out) --- # Install Bowtie (version 0.12.7) # For example, using conda: # conda create -n bowtie_env bowtie=0.12.7 # conda activate bowtie_env # Install samtools (for converting SAM to BAM and sorting) # conda install -c bioconda samtools # --- Reference Genome Preparation (commented out) --- # Download the mouse reference genome (mm9) from UCSC # wget -nc http://hgdownload.soe.ucsc.edu/goldenPath/mm9/bigZips/mm9.fa.gz # gunzip -f "${GENOME_FASTA}.gz" # Build Bowtie index for the mm9 genome # bowtie-build "${GENOME_FASTA}" "${GENOME_INDEX_PREFIX}" # --- CLIP-seq Read Processing --- # Step 1: Align CLIP-seq reads to the mouse genome (mm9) using Bowtie. # As described in Polymenidou et al., 2011: # -v 2: Allow up to two mismatches. # -m 1: Discard reads that map to more than one location (multi-mapping reads). # --best --strata: Report alignments that are "best" in terms of mismatches. bowtie -v 2 -m 1 --best --strata "${GENOME_INDEX_PREFIX}" "${INPUT_FASTQ}" "${OUTPUT_SAM}" # Step 2: Convert SAM to BAM and sort the aligned reads. # This is a standard post-alignment step for downstream analysis. samtools view -bS "${OUTPUT_SAM}" | samtools sort -o "${OUTPUT_BAM}" - # Step 3: Peak Calling. # Polymenidou et al., 2011 states: "Peaks were identified using a custom script # that identified regions with at least five overlapping reads." # The exact custom script is not publicly available. # Therefore, no specific bash command can be provided for this step. # In a modern context, dedicated CLIP-seq peak callers (e.g., CLIPper, Piranha) # would typically be used, or a custom script implementing the described criteria. # Example of a conceptual command if the script were available: # custom_peak_caller.sh "${OUTPUT_BAM}" 5 > "${OUTPUT_PEAKS}" -
2
Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters âq âl 20 âm 5 âk 5 ââbest).
$ Bash example
# Install Bowtie (if not already installed) # conda install -c bioconda bowtie=0.12.2 # Placeholder for input reads and output file # Replace 'input_reads.fastq' with your actual trimmed reads file # Replace 'output.sam' with your desired output alignment file name # Ensure the hg18 index is available. If not, you would need to build it first: # bowtie-build hg18.fa hg18 bowtie -q -l 20 -m 5 -k 5 --best hg18 input_reads.fastq > output.sam
Tools Used
Raw Source Text
CLIP-seq reads were processed as previously described (Polymenidou et al., 2011). Briefly, reads were trimmed to remove sequencing adaptors and homopolymeric runs >10nt, and mapped to the human genome (hg18) using Bowtie (version 0.12.2 with parameters âq âl 20 âm 5 âk 5 ââbest).