GSE122649 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

GSE122649

Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia [motor cortex]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Base calling performed with bcl2fastq conversion software v2.17

    bcl2fastq v2.17 GitHub
    $ Bash example
    # Install bcl2fastq (example using conda, adjust as needed)
    # conda install -c bioconda bcl2fastq
    
    # Base calling performed with bcl2fastq conversion software v2.17
    # Replace /path/to/illumina/run/folder with the actual path to your BCL data
    # Replace /path/to/output/fastq with your desired output directory for FASTQ files
    bcl2fastq --runfolder-dir /path/to/illumina/run/folder --output-dir /path/to/output/fastq
  2. 2

    Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.5.2b
    
    # Placeholder for reference genome index
    # Ensure the hg19 STAR index is built and available at this path.
    # Example: STAR --runMode genomeGenerate --genomeDir /path/to/STAR_index/hg19 --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf --runThreadN <num_threads>
    HG19_STAR_INDEX="/path/to/STAR_index/hg19"
    
    # Placeholder for input reads (e.g., a FASTQ file)
    INPUT_READS="input_reads.fastq.gz" # Use "input_reads_R1.fastq.gz input_reads_R2.fastq.gz" for paired-end
    
    # Output prefix for alignment files
    OUTPUT_PREFIX="aligned_star_"
    
    STAR --genomeDir "${HG19_STAR_INDEX}" \
         --readFilesIn "${INPUT_READS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outFilterMultimapNmax 100 \
         --outFilterMismatchNoverReadLmax 0.04 \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --runThreadN 8 # Example: use 8 threads, adjust as needed
    
    # The primary output BAM file will be named: ${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam
  3. 3

    Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse

    TEcount (from TEtranscripts) v2.0.3
    $ Bash example
    # Install TEtranscripts (which includes TEcount) if not already installed
    # conda create -n tetranscripts python=3.7
    # conda activate tetranscripts
    # pip install TEtranscripts==2.0.3
    
    # Placeholder for input BAM file (e.g., aligned RNA-seq reads)
    INPUT_BAM="input_aligned_reads.bam"
    # Placeholder for output prefix for quantification results
    OUTPUT_PREFIX="te_quantification_results"
    
    # Reference GTF files based on description
    # hg19 genic GTF from refGene (Jan 2018)
    GENE_GTF="hg19_refgene_genic_jan2018.gtf"
    # custom hg19 TE GTF generated from repeatMasker
    TE_GTF="hg19_repeatmasker_TEs.gtf"
    
    # Execute TEcount with specified configurations
    # --mode union and --format BAM are common defaults for TEcount with BAM input
    TEcount --mode union --format BAM --stranded reverse -b "${INPUT_BAM}" -g "${GENE_GTF}" -t "${TE_GTF}" --out "${OUTPUT_PREFIX}"

Tools Used

Raw Source Text
Base calling performed with bcl2fastq conversion software v2.17
Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
Genome_build: hg19
Supplementary_files_format_and_content: Individual count tables were generated as output from TEcount
← Back to Analysis