GSE51685 Processing Pipeline

RNA-Seq code_examples 6 steps

Publication

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.

Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860

Dataset

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia (strand specifi…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina Casava1.8.2 software used for basecalling.

Illumina Casava v1.8.2 GitHub

$ Bash example

# Illumina Casava 1.8.2 is an integrated software suite on Illumina sequencing instruments
# responsible for basecalling and initial data processing, including demultiplexing and
# generating FASTQ files. This is typically an instrument-level process, not a user-executable
# command-line tool. The command below is a conceptual representation of this process.

illumina_casava_basecall_and_demultiplex \
    --run-folder /path/to/illumina/run/folder \
    --output-directory /path/to/output/fastq \
    --version 1.8.2

View on GitHub

Adapters trimmed using Cutadapt v1.2.1.

cutadapt v1.2.1 GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt

# Define input and output file paths
READ1_INPUT="input_R1.fastq.gz"
READ2_INPUT="input_R2.fastq.gz"
READ1_TRIMMED="trimmed_R1.fastq.gz"
READ2_TRIMMED="trimmed_R2.fastq.gz"

# Define adapter sequences (replace with actual sequences if known)
# Common Illumina 3' adapters:
# For Read 1 (forward): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
# For Read 2 (reverse complement of Read 1 adapter): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
ADAPTER_R1="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA" # Example Illumina universal adapter for Read 1
ADAPTER_R2="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" # Example Illumina universal adapter for Read 2

# Run Cutadapt for paired-end reads
cutadapt \
  -a "${ADAPTER_R1}" \
  -A "${ADAPTER_R2}" \
  --minimum-length 20 \
  -o "${READ1_TRIMMED}" \
  -p "${READ2_TRIMMED}" \
  "${READ1_INPUT}" \
  "${READ2_INPUT}"

# For single-end reads, use:
# cutadapt \
#   -a "${ADAPTER_R1}" \
#   --minimum-length 20 \
#   -o "${READ1_TRIMMED}" \
#   "${READ1_INPUT}"

View on GitHub

Command: cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG

cutadapt v4.x GitHub

$ Bash example

# Install cutadapt (if not already installed)
# conda install -c bioconda cutadapt

# Example usage of cutadapt for adapter trimming
# Replace 'input.fastq.gz' with your actual input FASTQ file
# Replace 'output.fastq.gz' with your desired output FASTQ file name
cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG -o output.fastq.gz input.fastq.gz

View on GitHub

Trimmed reads mapped to mouse genome (mm9) using the GSNAP aligner (part of GMAP package, version 2012-07-20).

GSNAP v2012-07-20 GitHub

$ Bash example

# Install GSNAP (part of GMAP package) if not already available
# conda install -c bioconda gmap

# Define variables for input, output, and reference genome
INPUT_READS="trimmed_reads.fastq.gz" # Placeholder for your trimmed reads file
OUTPUT_SAM="mapped_reads.sam"       # Desired output SAM file name
GENOME_DIR="/path/to/gmap_indexes"  # Directory where GSNAP genome indexes are stored
GENOME_NAME="mm9"                   # Name of the indexed mouse genome (mm9)
NUM_THREADS=8                       # Number of CPU threads to use for alignment

# Note: Ensure the mm9 genome has been indexed for GSNAP using gmap_build prior to this step.
# Example (if index doesn't exist):
# gmap_build -D "${GENOME_DIR}" -d "${GENOME_NAME}" /path/to/mm9.fasta

# Execute GSNAP to map trimmed reads to the mm9 mouse genome
gsnap -D "${GENOME_DIR}" -d "${GENOME_NAME}" \
      -A sam \
      -o "${OUTPUT_SAM}" \
      -N 1 \
      -t "${NUM_THREADS}" \
      "${INPUT_READS}"

View on GitHub

Command: gsnap -Q -n 10 Â --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9

GSNAP vNot specified

$ Bash example

# Install GSNAP (example using conda)
# conda install -c bioconda gsnap

# Define reference genome (mm9)
# This assumes the mm9 genome has been indexed for GSNAP.
# Example indexing command (replace paths as needed):
# gsnap_build -D /path/to/genome_indices -d mm9 /path/to/mm9.fa

# Execute GSNAP command
gsnap -Q -n 10 --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9

Mapped reads assigned to coding regions of Ensembl mouse genes

Ensembl v2.0.6

$ Bash example

# conda install -c bioconda subread

# Define variables
# Placeholder for Ensembl mouse GTF (e.g., Ensembl release 110, GRCm39 assembly)
GTF_FILE="Mus_musculus.GRCm39.110.gtf"
INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file
OUTPUT_COUNTS="mouse_ensembl_cds_counts.txt"
NUM_THREADS=8 # Adjust based on available resources

# Example: Download Ensembl GTF if not already present
# wget -P . ftp://ftp.ensembl.org/pub/release-110/gtf/mus_musculus/Mus_musculus.GRCm39.110.gtf.gz
# gunzip Mus_musculus.GRCm39.110.gtf.gz

# Assign mapped reads to coding regions (CDS) of Ensembl mouse genes
featureCounts -a "${GTF_FILE}" \
              -o "${OUTPUT_COUNTS}" \
              -F GTF \
              -t CDS \
              -g gene_id \
              -T "${NUM_THREADS}" \
              "${INPUT_BAM}"

Raw Source Text

Illumina Casava1.8.2 software used for basecalling.
Adapters trimmed using Cutadapt v1.2.1. Command: cutadapt -O 5 -a GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
Trimmed reads mapped to mouse genome (mm9) using the GSNAP aligner (part of GMAP package, version 2012-07-20). Command: gsnap -Q -n 10 Â --quality-protocol="sanger" -t 4 -N 1 -A sam -B 5 -s mm9 -d mm9
Mapped reads assigned to coding regions of Ensembl mouse genes
Supplementary_files_format_and_content: tab-delimited text file containing RPKM values for each gene

← Back to Analysis