GSE51684 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for ALS and frontotemporal degeneration.

Proceedings of the National Academy of Sciences of the United States of America (2013) — PMID 24170860

Dataset

Targeted degradation of sense and antisense C9orf72 RNA foci as therapy for amyotrophic lateral sclerosis and frontotemporal dementia (Multiplex Anal…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Illumina Casava1.8.2 software used for basecalling.

Illumina Casava v1.8.2

$ Bash example

# Illumina Casava 1.8.2 was the proprietary software suite used on Illumina sequencers
# for basecalling and initial demultiplexing. This step is performed by the sequencing
# instrument itself, generating BCL files which are then converted to FASTQ files
# using tools like bcl2fastq. There is no direct command-line execution for Casava 1.8.2
# by the user post-sequencing.

Demultiplexing based on index sequences

fastp (Inferred with models/gemini-2.5-flash) v0.23.2 GitHub

$ Bash example

# Install fastp if not already available
# conda install -c bioconda fastp

# Define input and output paths
# MULTIPLEXED_FASTQ: The input FASTQ file containing reads from multiple samples, each with an in-line barcode.
# BARCODE_FILE: A tab-separated file where each line contains 'barcode_sequence\tsample_name'.
#               Example: 
#               GATACA\tsample_A
#               CGTTAG\tsample_B
# OUTPUT_DIR: Directory where demultiplexed FASTQ files will be saved.
# REPORT_PREFIX: Prefix for the JSON and HTML reports generated by fastp.
MULTIPLEXED_FASTQ="input_multiplexed_reads.fastq.gz"
BARCODE_FILE="sample_barcodes.tsv"
OUTPUT_DIR="demultiplexed_fastqs"
REPORT_PREFIX="demultiplexing_report"

mkdir -p "${OUTPUT_DIR}"

# Execute fastp for demultiplexing based on in-line index sequences (barcodes).
# -i: Input FASTQ file (can be gzipped).
# -o: Output FASTQ file pattern. fastp will replace '{barcode}' with the sample name from BARCODE_FILE.
# --barcode_file: Specifies the file containing barcode sequences and corresponding sample names.
# --json, --html: Generate detailed reports in JSON and HTML formats.
# --thread: Number of threads to use for processing.
fastp \
  -i "${MULTIPLEXED_FASTQ}" \
  -o "${OUTPUT_DIR}/{barcode}.fastq.gz" \
  --barcode_file "${BARCODE_FILE}" \
  --json "${REPORT_PREFIX}.json" \
  --html "${REPORT_PREFIX}.html" \
  --thread 8

View on GitHub

Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata

Bowtie v0.12.7 GitHub

$ Bash example

# Install Bowtie (example using conda)
# conda install -c bioconda bowtie=0.12.7

# Assuming 'refseq_rna_index' is the basename for the Bowtie index files
# and 'reads.fastq' is the input sequenced reads file.
# The output will be a SAM file redirected to 'output.sam'.
bowtie -q -e 100 -m 10 --best --strata refseq_rna_index reads.fastq > output.sam

View on GitHub

Count reads for each genes

featureCounts (Inferred with models/gemini-2.5-flash) v2.0.1 GitHub

$ Bash example

# Install Subread (which includes featureCounts) if not already installed
# conda install -c bioconda subread

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Replace with your actual aligned BAM file
GENE_ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Replace with your actual GTF file (e.g., from Ensembl or GENCODE)
OUTPUT_COUNTS_FILE="gene_counts.txt"
NUM_THREADS=8 # Adjust as needed

# Count reads for each gene using featureCounts
# -a: Annotation file (GTF/GFF format)
# -o: Output file for read counts
# -F GTF: Specify that the annotation file is in GTF format
# -t exon: Specify feature type to count (e.g., "exon")
# -g gene_id: Specify attribute to group features by (e.g., "gene_id")
# -s 2: Reverse stranded library (common for eCLIP assays)
# -T: Number of threads
# --primary: Only count primary alignments (useful for multi-mapping reads)
featureCounts -a "${GENE_ANNOTATION_GTF}" \
              -o "${OUTPUT_COUNTS_FILE}" \
              -F GTF \
              -t exon \
              -g gene_id \
              -s 2 \
              -T "${NUM_THREADS}" \
              --primary \
              "${INPUT_BAM}"

View on GitHub

Raw Source Text

Illumina Casava1.8.2 software used for basecalling.
Demultiplexing based on index sequences
Sequenced reads were mapped to Refseq RNA sequences using bowtie v0.12.7 with parameters -q -e 100 -m 10 --best --strata
Count reads for each genes
Supplementary_files_format_and_content: tab-delimited text files include counts for each genes in each samples.

← Back to Analysis