GSE124439 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.Acta neuropathologica (2022) — PMID 35778567
Dataset
GSE124439Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.5.2b # Define input and output files READ1="sample_R1.fastq.gz" # Placeholder for forward reads READ2="sample_R2.fastq.gz" # Placeholder for reverse reads GENOME_DIR="/path/to/hg19_star_index" # Placeholder for indexed hg19 genome OUTPUT_DIR="star_alignment_output" NUM_THREADS=8 # Example number of threads # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run STAR alignment for paired-end RNA-Seq reads STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" "${READ2}" \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_DIR}/" \ --outFilterMultimapNmax 100 \ --outFilterMismatchNoverReadLmax 0.04 \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMattributes Standard \ --readFilesCommand zcat # Use zcat for gzipped fastq files -
2
Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
TEcount v2.0.3$ Bash example
# Install TEtranscripts (which includes TEcount) # conda install -c bioconda tetranscripts # Placeholder for hg19 genic GTF from refGene (Jan 2018) # This file would typically be generated from UCSC Table Browser or a similar resource. GENE_GTF="path/to/hg19_refGene_genes.gtf" # Placeholder for custom hg19 TE GTF generated from RepeatMasker # This file would be custom-generated by the user based on RepeatMasker output. TE_GTF="path/to/hg19_repeatmasker_TEs.gtf" # Input BAM file(s) (replace with actual input file paths) INPUT_BAM="sample1.bam" # Output file for quantification results OUTPUT_COUNTS="quantification_results.tsv" # Execute TEcount for quantification of genes and transposable elements TEcount --format BAM --mode multi --stranded reverse --GTF "${GENE_GTF}" --TE "${TE_GTF}" -o "${OUTPUT_COUNTS}" "${INPUT_BAM}"
Tools Used
Raw Source Text
fastq Illumina RNASeq paired-end reads were aligned to the hg19 reference genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04 Genes and transposable elements (TE) were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse Genome_build: hg19 Supplementary_files_format_and_content: Individual count tables (SampleName_counts.txt) were generated as output from TEcount