GSE220459 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.

Neuron (2024) — PMID 38697111

Dataset

GSE220459

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size (RNA-seq NPC)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

The raw data was mapped using STAR.

STAR v2.7.10a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda create -n star_env star -y
# conda activate star_env

# Build STAR genome index (run once per genome, e.g., GRCh38)
# mkdir -p /path/to/genome_index/GRCh38
# STAR --runThreadN 8 \
#      --runMode genomeGenerate \
#      --genomeDir /path/to/genome_index/GRCh38 \
#      --genomeFastaFiles /path/to/fasta/GRCh38.primary_assembly.genome.fa \
#      --sjdbGTFfile /path/to/gtf/gencode.v38.annotation.gtf \
#      --sjdbOverhang 100 # Recommended: (read_length - 1)

# Map raw data using STAR (example for paired-end reads)
STAR --runThreadN 8 \
     --genomeDir /path/to/genome_index/GRCh38 \
     --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \
     --readFilesCommand zcat \
     --outFileNamePrefix sample_ \
     --outSAMtype BAM SortedByCoordinate \
     --outBAMcompression 6 \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 3 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --quantMode GeneCounts # Optional: for gene-level quantification

View on GitHub

We calculated the gene-level read counts and identified differentially expressed genes by in-house script.

In-house script vN/A

$ Bash example

# This script calculates gene-level read counts and identifies differentially expressed genes.
# The specific implementation details are within the "in-house script".
#
# Pre-requisite: Gene-level read counts (e.g., from featureCounts, HTSeq-count, or RSEM).
# Example for generating counts (assuming aligned BAM files in 'aligned_bams/'):
# # conda install -c bioconda subread
# # featureCounts -a gencode.v44.annotation.gtf -o gene_counts.tsv -F GTF -t exon -g gene_id aligned_bams/*.bam
#
# Input for DE analysis: gene_counts.tsv (gene-level read counts), sample_metadata.tsv (experimental design)
# Output: diff_exp_results.tsv (table of differentially expressed genes)
#
# Reference datasets (placeholders):
# - Gene annotation: gencode.v44.annotation.gtf (GENCODE human release 44, GRCh38)

# Execute the in-house script for differential expression analysis.
# The actual command, script name, and parameters will depend on the specific in-house implementation.
# This is a placeholder command demonstrating typical inputs and outputs.
Rscript run_inhouse_de_script.R \
  --counts_file gene_counts.tsv \
  --metadata_file sample_metadata.tsv \
  --output_file diff_exp_results.tsv \
  --design_formula "~ condition + batch" \
  --min_reads_per_gene 10 \
  --fdr_threshold 0.05 \
  --log2fc_threshold 1.0

Tools Used

STAR

Raw Source Text

The raw data was mapped using STAR.
We calculated the gene-level read counts and identified differentially expressed genes by in-house script.
Assembly: mm10

← Back to Analysis