GSE124439 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

GSE124439

Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.5.2b
    
    # Define input and output files
    READ1="sample_R1.fastq.gz" # Placeholder for forward reads
    READ2="sample_R2.fastq.gz" # Placeholder for reverse reads
    GENOME_DIR="/path/to/hg19_star_index" # Placeholder for indexed hg19 genome
    OUTPUT_DIR="star_alignment_output"
    NUM_THREADS=8 # Example number of threads
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run STAR alignment for paired-end RNA-Seq reads
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READ1}" "${READ2}" \
         --runThreadN "${NUM_THREADS}" \
         --outFileNamePrefix "${OUTPUT_DIR}/" \
         --outFilterMultimapNmax 100 \
         --outFilterMismatchNoverReadLmax 0.04 \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMunmapped Within \
         --outSAMattributes Standard \
         --readFilesCommand zcat # Use zcat for gzipped fastq files
    
  2. 2

    Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse

    TEcount v2.0.3
    $ Bash example
    # Install TEtranscripts (which includes TEcount)
    # conda install -c bioconda tetranscripts
    
    # Placeholder for hg19 genic GTF from refGene (Jan 2018)
    # This file would typically be generated from UCSC Table Browser or a similar resource.
    GENE_GTF="path/to/hg19_refGene_genes.gtf"
    
    # Placeholder for custom hg19 TE GTF generated from RepeatMasker
    # This file would be custom-generated by the user based on RepeatMasker output.
    TE_GTF="path/to/hg19_repeatmasker_TEs.gtf"
    
    # Input BAM file(s) (replace with actual input file paths)
    INPUT_BAM="sample1.bam"
    
    # Output file for quantification results
    OUTPUT_COUNTS="quantification_results.tsv"
    
    # Execute TEcount for quantification of genes and transposable elements
    TEcount --format BAM --mode multi --stranded reverse --GTF "${GENE_GTF}" --TE "${TE_GTF}" -o "${OUTPUT_COUNTS}" "${INPUT_BAM}"

Tools Used

Raw Source Text
fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
Genome_build: hg19
Supplementary_files_format_and_content: Individual count tables (SampleName_counts.txt) were generated as output from TEcount
← Back to Analysis