GSE124439 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04

STAR v2.5.2b GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.5.2b

# Define input and output files
READ1="sample_R1.fastq.gz" # Placeholder for forward reads
READ2="sample_R2.fastq.gz" # Placeholder for reverse reads
GENOME_DIR="/path/to/hg19_star_index" # Placeholder for indexed hg19 genome
OUTPUT_DIR="star_alignment_output"
NUM_THREADS=8 # Example number of threads

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run STAR alignment for paired-end RNA-Seq reads
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1}" "${READ2}" \
     --runThreadN "${NUM_THREADS}" \
     --outFileNamePrefix "${OUTPUT_DIR}/" \
     --outFilterMultimapNmax 100 \
     --outFilterMismatchNoverReadLmax 0.04 \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes Standard \
     --readFilesCommand zcat # Use zcat for gzipped fastq files

View on GitHub

Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse

TEcount v2.0.3

$ Bash example

# Install TEtranscripts (which includes TEcount)
# conda install -c bioconda tetranscripts

# Placeholder for hg19 genic GTF from refGene (Jan 2018)
# This file would typically be generated from UCSC Table Browser or a similar resource.
GENE_GTF="path/to/hg19_refGene_genes.gtf"

# Placeholder for custom hg19 TE GTF generated from RepeatMasker
# This file would be custom-generated by the user based on RepeatMasker output.
TE_GTF="path/to/hg19_repeatmasker_TEs.gtf"

# Input BAM file(s) (replace with actual input file paths)
INPUT_BAM="sample1.bam"

# Output file for quantification results
OUTPUT_COUNTS="quantification_results.tsv"

# Execute TEcount for quantification of genes and transposable elements
TEcount --format BAM --mode multi --stranded reverse --GTF "${GENE_GTF}" --TE "${TE_GTF}" -o "${OUTPUT_COUNTS}" "${INPUT_BAM}"

Tools Used

STAR

Raw Source Text

fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
Genome_build: hg19
Supplementary_files_format_and_content: Individual count tables (SampleName_counts.txt) were generated as output from TEcount

← Back to Analysis