GSE122649 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia [motor cortex]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Base calling performed with bcl2fastq conversion software v2.17

bcl2fastq v2.17 GitHub

$ Bash example

# Install bcl2fastq (example using conda, adjust as needed)
# conda install -c bioconda bcl2fastq

# Base calling performed with bcl2fastq conversion software v2.17
# Replace /path/to/illumina/run/folder with the actual path to your BCL data
# Replace /path/to/output/fastq with your desired output directory for FASTQ files
bcl2fastq --runfolder-dir /path/to/illumina/run/folder --output-dir /path/to/output/fastq

View on GitHub

Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04

STAR v2.5.2b GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.5.2b

# Placeholder for reference genome index
# Ensure the hg19 STAR index is built and available at this path.
# Example: STAR --runMode genomeGenerate --genomeDir /path/to/STAR_index/hg19 --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf --runThreadN <num_threads>
HG19_STAR_INDEX="/path/to/STAR_index/hg19"

# Placeholder for input reads (e.g., a FASTQ file)
INPUT_READS="input_reads.fastq.gz" # Use "input_reads_R1.fastq.gz input_reads_R2.fastq.gz" for paired-end

# Output prefix for alignment files
OUTPUT_PREFIX="aligned_star_"

STAR --genomeDir "${HG19_STAR_INDEX}" \
     --readFilesIn "${INPUT_READS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outFilterMultimapNmax 100 \
     --outFilterMismatchNoverReadLmax 0.04 \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard \
     --runThreadN 8 # Example: use 8 threads, adjust as needed

# The primary output BAM file will be named: ${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam

View on GitHub

Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse

TEcount (from TEtranscripts) v2.0.3

$ Bash example

# Install TEtranscripts (which includes TEcount) if not already installed
# conda create -n tetranscripts python=3.7
# conda activate tetranscripts
# pip install TEtranscripts==2.0.3

# Placeholder for input BAM file (e.g., aligned RNA-seq reads)
INPUT_BAM="input_aligned_reads.bam"
# Placeholder for output prefix for quantification results
OUTPUT_PREFIX="te_quantification_results"

# Reference GTF files based on description
# hg19 genic GTF from refGene (Jan 2018)
GENE_GTF="hg19_refgene_genic_jan2018.gtf"
# custom hg19 TE GTF generated from repeatMasker
TE_GTF="hg19_repeatmasker_TEs.gtf"

# Execute TEcount with specified configurations
# --mode union and --format BAM are common defaults for TEcount with BAM input
TEcount --mode union --format BAM --stranded reverse -b "${INPUT_BAM}" -g "${GENE_GTF}" -t "${TE_GTF}" --out "${OUTPUT_PREFIX}"

Tools Used

STAR

Raw Source Text

Base calling performed with bcl2fastq conversion software v2.17
Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
Genome_build: hg19
Supplementary_files_format_and_content: Individual count tables were generated as output from TEcount

← Back to Analysis