GSE122649 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.Acta neuropathologica (2022) — PMID 35778567
Dataset
GSE122649Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia [motor cortex]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Base calling performed with bcl2fastq conversion software v2.17
$ Bash example
# Install bcl2fastq (example using conda, adjust as needed) # conda install -c bioconda bcl2fastq # Base calling performed with bcl2fastq conversion software v2.17 # Replace /path/to/illumina/run/folder with the actual path to your BCL data # Replace /path/to/output/fastq with your desired output directory for FASTQ files bcl2fastq --runfolder-dir /path/to/illumina/run/folder --output-dir /path/to/output/fastq
-
2
Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.5.2b # Placeholder for reference genome index # Ensure the hg19 STAR index is built and available at this path. # Example: STAR --runMode genomeGenerate --genomeDir /path/to/STAR_index/hg19 --genomeFastaFiles /path/to/hg19.fa --sjdbGTFfile /path/to/hg19.gtf --runThreadN <num_threads> HG19_STAR_INDEX="/path/to/STAR_index/hg19" # Placeholder for input reads (e.g., a FASTQ file) INPUT_READS="input_reads.fastq.gz" # Use "input_reads_R1.fastq.gz input_reads_R2.fastq.gz" for paired-end # Output prefix for alignment files OUTPUT_PREFIX="aligned_star_" STAR --genomeDir "${HG19_STAR_INDEX}" \ --readFilesIn "${INPUT_READS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outFilterMultimapNmax 100 \ --outFilterMismatchNoverReadLmax 0.04 \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --runThreadN 8 # Example: use 8 threads, adjust as needed # The primary output BAM file will be named: ${OUTPUT_PREFIX}Aligned.sortedByCoord.out.bam -
3
Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse
TEcount (from TEtranscripts) v2.0.3$ Bash example
# Install TEtranscripts (which includes TEcount) if not already installed # conda create -n tetranscripts python=3.7 # conda activate tetranscripts # pip install TEtranscripts==2.0.3 # Placeholder for input BAM file (e.g., aligned RNA-seq reads) INPUT_BAM="input_aligned_reads.bam" # Placeholder for output prefix for quantification results OUTPUT_PREFIX="te_quantification_results" # Reference GTF files based on description # hg19 genic GTF from refGene (Jan 2018) GENE_GTF="hg19_refgene_genic_jan2018.gtf" # custom hg19 TE GTF generated from repeatMasker TE_GTF="hg19_repeatmasker_TEs.gtf" # Execute TEcount with specified configurations # --mode union and --format BAM are common defaults for TEcount with BAM input TEcount --mode union --format BAM --stranded reverse -b "${INPUT_BAM}" -g "${GENE_GTF}" -t "${TE_GTF}" --out "${OUTPUT_PREFIX}"
Tools Used
Raw Source Text
Base calling performed with bcl2fastq conversion software v2.17 Reads were aligned to hg19 genome using STAR v2.5.2b with the following configurations: --outFilterMultimapNmax 100; --outFilterMismatchNoverReadLmax 0.04 Genes and TE were quantified using TEcount (from TEtranscripts v2.0.3) with the following configurations: hg19 genic GTF from refGene (Jan 2018), custom hg19 TE GTF generated from repeatMasker; --stranded reverse Genome_build: hg19 Supplementary_files_format_and_content: Individual count tables were generated as output from TEcount