GSE220460 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.

Neuron (2024) — PMID 38697111

Dataset

GSE220460

Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size (RNA-seq e13invivo)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    The raw data was mapped using STAR.

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables (replace with actual paths and filenames)
    GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder: Use a STAR-indexed human genome (e.g., hg38)
    READ1_FASTQ="input_R1.fastq.gz" # Placeholder: Path to your R1 FASTQ file
    READ2_FASTQ="input_R2.fastq.gz" # Placeholder: Path to your R2 FASTQ file (remove if single-end)
    OUTPUT_PREFIX="mapped_data" # Prefix for output files
    NUM_THREADS=8 # Number of threads to use
    
    # Create genome index if not already present (run once per genome)
    # STAR --runMode genomeGenerate \
    #      --genomeDir ${GENOME_DIR} \
    #      --genomeFastaFiles /path/to/hg38.fa \
    #      --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \
    #      --runThreadN ${NUM_THREADS}
    
    # Map raw data using STAR
    STAR --genomeDir ${GENOME_DIR} \
         --readFilesIn ${READ1_FASTQ} ${READ2_FASTQ} \
         --runThreadN ${NUM_THREADS} \
         --outFileNamePrefix ${OUTPUT_PREFIX}_ \
         --outSAMtype BAM SortedByCoordinate \
         --outFilterMultimapNmax 20 \
         --alignSJoverhangMin 8 \
         --outFilterMismatchNmax 3 \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66 \
         --quantMode GeneCounts # Optional: Add GeneCounts for gene expression quantification
    
  2. 2

    We calculated the gene-level read counts and identified differentially expressed genes by in-house script.

    In-house script vCustom
    $ Bash example
    # This script represents a conceptual execution of an "in-house script"
    # for calculating gene-level read counts and performing differential expression analysis.
    # The actual script name, programming language (e.g., Python, R), and parameters
    # would be specific to the in-house implementation.
    
    # --- Reference Data Setup (Example: Ensembl GRCh38, release 111 GTF) ---
    # Download the gene annotation file if not already present.
    # mkdir -p references
    # cd references
    # wget -c https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz
    # gunzip -f Homo_sapiens.GRCh38.111.gtf.gz
    # cd ..
    
    GENE_ANNOTATION="references/Homo_sapiens.GRCh38.111.gtf" # Path to your GTF file
    
    # --- Input Data (Example: Aligned BAM files) ---
    # These are placeholder BAM files that would typically be generated in a preceding alignment step.
    # Replace with actual paths to your input BAM files.
    INPUT_BAM_FILES=(
        "data/sample_treated_rep1.bam"
        "data/sample_treated_rep2.bam"
        "data/sample_control_rep1.bam"
        "data/sample_control_rep2.bam"
    )
    
    # Convert array to space-separated string for command line
    INPUT_BAM_STRING="${INPUT_BAM_FILES[*]}"
    
    # --- Experimental Design File ---
    # A design file (e.g., CSV or TSV) is crucial for differential expression analysis,
    # mapping samples to experimental conditions.
    # Example content for 'design.csv':
    # sample_id,condition
    # sample_treated_rep1,treated
    # sample_treated_rep2,treated
    # sample_control_rep1,control
    # sample_control_rep2,control
    #
    # Create a placeholder design file if it doesn't exist
    # echo "sample_id,condition" > design.csv
    # echo "sample_treated_rep1,treated" >> design.csv
    # echo "sample_treated_rep2,treated" >> design.csv
    # echo "sample_control_rep1,control" >> design.csv
    # echo "sample_control_rep2,control" >> design.csv
    
    DESIGN_FILE="design.csv"
    
    # --- Output Files ---
    OUTPUT_COUNTS_FILE="gene_level_read_counts.tsv"
    OUTPUT_DE_RESULTS="differentially_expressed_genes.tsv"
    OUTPUT_LOG="in_house_script.log"
    
    # --- Execute the In-House Script ---
    # This command is a conceptual representation.
    # The actual script name and parameters would vary based on the in-house implementation.
    # It is assumed this script handles both gene counting and DE analysis.
    in_house_gene_quant_and_de_script.py \
        --input_bams "${INPUT_BAM_STRING}" \
        --gene_annotation "${GENE_ANNOTATION}" \
        --design_file "${DESIGN_FILE}" \
        --output_counts "${OUTPUT_COUNTS_FILE}" \
        --output_de_results "${OUTPUT_DE_RESULTS}" \
        --log_file "${OUTPUT_LOG}"

Tools Used

Raw Source Text
The raw data was mapped using STAR.
We calculated the gene-level read counts and identified differentially expressed genes by in-house script.
Assembly: mm10
← Back to Analysis