GSE220462 Processing Pipeline — Yeo Lab Publications

Publication

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.

Neuron (2024) — PMID 38697111

Dataset

GSE220462

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

The raw data was mapped using STAR.

STAR v2.7.9a (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star

# Create STAR genome index (if not already done)
# Replace /path/to/genome_fasta.fa with your reference genome FASTA file
# Replace /path/to/annotations.gtf with your gene annotations GTF file
# Replace /path/to/STAR_index_hg38 with your desired index directory
# STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /path/to/STAR_index_hg38 --genomeFastaFiles /path/to/genome_fasta.fa --sjdbGTFfile /path/to/annotations.gtf --sjdbOverhang 100

# Define variables
GENOME_DIR="/path/to/STAR_index_hg38" # Placeholder: Replace with path to your STAR genome index for hg38
READ1="sample_R1.fastq.gz" # Placeholder: Replace with your R1 FASTQ file
READ2="sample_R2.fastq.gz" # Placeholder: Replace with your R2 FASTQ file (remove if single-end)
OUTPUT_PREFIX="sample_aligned_"
THREADS=8 # Number of threads to use

# Run STAR alignment
STAR \
  --genomeDir ${GENOME_DIR} \
  --readFilesIn ${READ1} ${READ2} \
  --runThreadN ${THREADS} \
  --outFileNamePrefix ${OUTPUT_PREFIX} \
  --outSAMtype BAM SortedByCoordinate \
  --outSAMattributes All \
  --outFilterMismatchNmax 3 \
  --outFilterScoreMinOverLread 0.66 \
  --outFilterMatchNminOverLread 0.66 \
  --alignIntronMin 20 \
  --alignIntronMax 1000000 \
  --alignMatesGapMax 1000000 \
  --outReadsUnmapped Fastx \
  --outFilterType BySJout \
  --outFilterMultimapNmax 20 \
  --outFilterMultimapScoreRange 1 \
  --outFilterScoreMin 10 \
  --outFilterMatchNmin 10 \
  --limitSjdbInsertNsj 1200000 \
  --sjdbScore 1 \
  --seedSearchStartLmax 30 \
  --seedPerReadNmax 1000 \
  --seedPerWindowNmax 50 \
  --alignTranscriptsPerReadNmax 10000 \
  --alignTranscriptsPerWindowNmax 1000

# The output BAM file will be named sample_aligned_Aligned.sortedByCoord.out.bam
# Other output files (e.g., Log.final.out, SJ.out.tab) will also be generated.

View on GitHub

2

We calculated the gene-level read counts and identified differentially expressed genes by in-house script.

in-house script vN/A

$ Bash example

# Installation of common tools that might be wrapped by an in-house script
# conda install -c bioconda subread # For featureCounts
# conda install -c conda-forge r-base # For R-based DE analysis (DESeq2, edgeR)
# R -e "install.packages('DESeq2')"
# R -e "install.packages('edgeR')"

# Placeholder for input BAM files (replace with actual paths)
BAM_FILES="sample1_rep1.bam sample1_rep2.bam sample2_rep1.bam sample2_rep2.bam"

# Placeholder for gene annotation GTF file (replace with actual path or download)
# Example download for GRCh38:
# wget -O Homo_sapiens.GRCh38.109.gtf.gz "https://ftp.ensembl.org/pub/release-109/gtf/homo_sapiens/Homo_sapiens.GRCh38.109.gtf.gz"
# gunzip Homo_sapiens.GRCh38.109.gtf.gz
GENE_ANNOTATION="Homo_sapiens.GRCh38.109.gtf"

# Placeholder for experimental design file (e.g., tab-separated file with sample_id, condition)
# Example design.tsv:
# sample_id    condition
# sample1_rep1    treated
# sample1_rep2    treated
# sample2_rep1    control
# sample2_rep2    control
DESIGN_FILE="design.tsv"

# Execute the conceptual in-house script
# The actual command would depend on the implementation of the in-house script.
# It typically takes aligned BAM files, a gene annotation GTF, and a design file.
# It performs gene quantification and differential expression analysis.
./in_house_expression_pipeline.sh \
  --bams ${BAM_FILES} \
  --gtf ${GENE_ANNOTATION} \
  --design ${DESIGN_FILE} \
  --output_dir ./results

Tools Used

STAR