GSE51685 Processing Pipeline
RNA-Seq
code_examples
6 steps
Publication
Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860
Dataset
GSE51685Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia (strand specifi…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Illumina Casava1.8.2 software used for basecalling.
$ Bash example
# Illumina Casava 1.8.2 is an integrated software suite on Illumina sequencing instruments # responsible for basecalling and initial data processing, including demultiplexing and # generating FASTQ files. This is typically an instrument-level process, not a user-executable # command-line tool. The command below is a conceptual representation of this process. illumina_casava_basecall_and_demultiplex \ --run-folder /path/to/illumina/run/folder \ --output-directory /path/to/output/fastq \ --version 1.8.2 -
2
Adapters trimmed using Cutadapt v1.2.1.
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt # Define input and output file paths READ1_INPUT="input_R1.fastq.gz" READ2_INPUT="input_R2.fastq.gz" READ1_TRIMMED="trimmed_R1.fastq.gz" READ2_TRIMMED="trimmed_R2.fastq.gz" # Define adapter sequences (replace with actual sequences if known) # Common Illumina 3' adapters: # For Read 1 (forward): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA # For Read 2 (reverse complement of Read 1 adapter): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example Illumina universal adapter for Read 1 ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Example Illumina universal adapter for Read 2 # Run Cutadapt for paired-end reads cutadapt \ -a "${ADAPTER_R1}" \ -A "${ADAPTER_R2}" \ --minimum-length 20 \ -o "${READ1_TRIMMED}" \ -p "${READ2_TRIMMED}" \ "${READ1_INPUT}" \ "${READ2_INPUT}" # For single-end reads, use: # cutadapt \ # -a "${ADAPTER_R1}" \ # --minimum-length 20 \ # -o "${READ1_TRIMMED}" \ # "${READ1_INPUT}" -
3
Command: cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
$ Bash example
# Install cutadapt (if not already installed) # conda install -c bioconda cutadapt # Example usage of cutadapt for adapter trimming # Replace 'input.fastq.gz' with your actual input FASTQ file # Replace 'output.fastq.gz' with your desired output FASTQ file name cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG -o output.fastq.gz input.fastq.gz
-
4
Trimmed reads mapped to mouse genome (mm9) using the GSNAP aligner (part of GMAP package, version 2012-07-20).
$ Bash example
# Install GSNAP (part of GMAP package) if not already available # conda install -c bioconda gmap # Define variables for input, output, and reference genome INPUT_READS="trimmed_reads.fastq.gz" # Placeholder for your trimmed reads file OUTPUT_SAM="mapped_reads.sam" # Desired output SAM file name GENOME_DIR="/path/to/gmap_indexes" # Directory where GSNAP genome indexes are stored GENOME_NAME="mm9" # Name of the indexed mouse genome (mm9) NUM_THREADS=8 # Number of CPU threads to use for alignment # Note: Ensure the mm9 genome has been indexed for GSNAP using gmap_build prior to this step. # Example (if index doesn't exist): # gmap_build -D "${GENOME_DIR}" -d "${GENOME_NAME}" /path/to/mm9.fasta # Execute GSNAP to map trimmed reads to the mm9 mouse genome gsnap -D "${GENOME_DIR}" -d "${GENOME_NAME}" \ -A sam \ -o "${OUTPUT_SAM}" \ -N 1 \ -t "${NUM_THREADS}" \ "${INPUT_READS}" -
5
Command: gsnap -Q -n 10 Â --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9
GSNAP vNot specified$ Bash example
# Install GSNAP (example using conda) # conda install -c bioconda gsnap # Define reference genome (mm9) # This assumes the mm9 genome has been indexed for GSNAP. # Example indexing command (replace paths as needed): # gsnap_build -D /path/to/genome_indices -d mm9 /path/to/mm9.fa # Execute GSNAP command gsnap -Q -n 10 --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9
-
6
Mapped reads assigned to coding regions of Ensembl mouse genes
Ensembl v2.0.6$ Bash example
# conda install -c bioconda subread # Define variables # Placeholder for Ensembl mouse GTF (e.g., Ensembl release 110, GRCm39 assembly) GTF_FILE="Mus_musculus.GRCm39.110.gtf" INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file OUTPUT_COUNTS="mouse_ensembl_cds_counts.txt" NUM_THREADS=8 # Adjust based on available resources # Example: Download Ensembl GTF if not already present # wget -P . ftp://ftp.ensembl.org/pub/release-110/gtf/mus_musculus/Mus_musculus.GRCm39.110.gtf.gz # gunzip Mus_musculus.GRCm39.110.gtf.gz # Assign mapped reads to coding regions (CDS) of Ensembl mouse genes featureCounts -a "${GTF_FILE}" \ -o "${OUTPUT_COUNTS}" \ -F GTF \ -t CDS \ -g gene_id \ -T "${NUM_THREADS}" \ "${INPUT_BAM}"
Raw Source Text
Illumina Casava1.8.2 software used for basecalling. Adapters trimmed using Cutadapt v1.2.1. Command: cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG Trimmed reads mapped to mouse genome (mm9) using the GSNAP aligner (part of GMAP package, version 2012-07-20). Command: gsnap -Q -n 10 Â --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9 Mapped reads assigned to coding regions of Ensembl mouse genes Supplementary_files_format_and_content: tab-delimited text file containing RPKM values for each gene