GSE220460 Processing Pipeline — Yeo Lab Publications

Publication

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.

Neuron (2024) — PMID 38697111

Dataset

GSE220460

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size (RNA-seq e13invivo)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

The raw data was mapped using STAR.

STAR v2.7.10a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables (replace with actual paths and filenames)
GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder: Use a STAR-indexed human genome (e.g., hg38)
READ1_FASTQ="input_R1.fastq.gz" # Placeholder: Path to your R1 FASTQ file
READ2_FASTQ="input_R2.fastq.gz" # Placeholder: Path to your R2 FASTQ file (remove if single-end)
OUTPUT_PREFIX="mapped_data" # Prefix for output files
NUM_THREADS=8 # Number of threads to use

# Create genome index if not already present (run once per genome)
# STAR --runMode genomeGenerate \
#      --genomeDir ${GENOME_DIR} \
#      --genomeFastaFiles /path/to/hg38.fa \
#      --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \
#      --runThreadN ${NUM_THREADS}

# Map raw data using STAR
STAR --genomeDir ${GENOME_DIR} \
     --readFilesIn ${READ1_FASTQ} ${READ2_FASTQ} \
     --runThreadN ${NUM_THREADS} \
     --outFileNamePrefix ${OUTPUT_PREFIX}_ \
     --outSAMtype BAM SortedByCoordinate \
     --outFilterMultimapNmax 20 \
     --alignSJoverhangMin 8 \
     --outFilterMismatchNmax 3 \
     --outFilterScoreMinOverLread 0.66 \
     --outFilterMatchNminOverLread 0.66 \
     --quantMode GeneCounts # Optional: Add GeneCounts for gene expression quantification

View on GitHub

2

We calculated the gene-level read counts and identified differentially expressed genes by in-house script.

In-house script vCustom

$ Bash example

# This script represents a conceptual execution of an "in-house script"
# for calculating gene-level read counts and performing differential expression analysis.
# The actual script name, programming language (e.g., Python, R), and parameters
# would be specific to the in-house implementation.

# --- Reference Data Setup (Example: Ensembl GRCh38, release 111 GTF) ---
# Download the gene annotation file if not already present.
# mkdir -p references
# cd references
# wget -c https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz
# gunzip -f Homo_sapiens.GRCh38.111.gtf.gz
# cd ..

GENE_ANNOTATION="references/Homo_sapiens.GRCh38.111.gtf" # Path to your GTF file

# --- Input Data (Example: Aligned BAM files) ---
# These are placeholder BAM files that would typically be generated in a preceding alignment step.
# Replace with actual paths to your input BAM files.
INPUT_BAM_FILES=(
    "data/sample_treated_rep1.bam"
    "data/sample_treated_rep2.bam"
    "data/sample_control_rep1.bam"
    "data/sample_control_rep2.bam"
)

# Convert array to space-separated string for command line
INPUT_BAM_STRING="${INPUT_BAM_FILES[*]}"

# --- Experimental Design File ---
# A design file (e.g., CSV or TSV) is crucial for differential expression analysis,
# mapping samples to experimental conditions.
# Example content for 'design.csv':
# sample_id,condition
# sample_treated_rep1,treated
# sample_treated_rep2,treated
# sample_control_rep1,control
# sample_control_rep2,control
#
# Create a placeholder design file if it doesn't exist
# echo "sample_id,condition" > design.csv
# echo "sample_treated_rep1,treated" >> design.csv
# echo "sample_treated_rep2,treated" >> design.csv
# echo "sample_control_rep1,control" >> design.csv
# echo "sample_control_rep2,control" >> design.csv

DESIGN_FILE="design.csv"

# --- Output Files ---
OUTPUT_COUNTS_FILE="gene_level_read_counts.tsv"
OUTPUT_DE_RESULTS="differentially_expressed_genes.tsv"
OUTPUT_LOG="in_house_script.log"

# --- Execute the In-House Script ---
# This command is a conceptual representation.
# The actual script name and parameters would vary based on the in-house implementation.
# It is assumed this script handles both gene counting and DE analysis.
in_house_gene_quant_and_de_script.py \
    --input_bams "${INPUT_BAM_STRING}" \
    --gene_annotation "${GENE_ANNOTATION}" \
    --design_file "${DESIGN_FILE}" \
    --output_counts "${OUTPUT_COUNTS_FILE}" \
    --output_de_results "${OUTPUT_DE_RESULTS}" \
    --log_file "${OUTPUT_LOG}"

Tools Used

STAR