GSE220460 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.Neuron (2024) — PMID 38697111
Dataset
GSE220460Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size (RNA-seq e13invivo)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
The raw data was mapped using STAR.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables (replace with actual paths and filenames) GENOME_DIR="/path/to/STAR_index/hg38" # Placeholder: Use a STAR-indexed human genome (e.g., hg38) READ1_FASTQ="input_R1.fastq.gz" # Placeholder: Path to your R1 FASTQ file READ2_FASTQ="input_R2.fastq.gz" # Placeholder: Path to your R2 FASTQ file (remove if single-end) OUTPUT_PREFIX="mapped_data" # Prefix for output files NUM_THREADS=8 # Number of threads to use # Create genome index if not already present (run once per genome) # STAR --runMode genomeGenerate \ # --genomeDir ${GENOME_DIR} \ # --genomeFastaFiles /path/to/hg38.fa \ # --sjdbGTFfile /path/to/gencode.vXX.annotation.gtf \ # --runThreadN ${NUM_THREADS} # Map raw data using STAR STAR --genomeDir ${GENOME_DIR} \ --readFilesIn ${READ1_FASTQ} ${READ2_FASTQ} \ --runThreadN ${NUM_THREADS} \ --outFileNamePrefix ${OUTPUT_PREFIX}_ \ --outSAMtype BAM SortedByCoordinate \ --outFilterMultimapNmax 20 \ --alignSJoverhangMin 8 \ --outFilterMismatchNmax 3 \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --quantMode GeneCounts # Optional: Add GeneCounts for gene expression quantification -
2
We calculated the gene-level read counts and identified differentially expressed genes by in-house script.
In-house script vCustom$ Bash example
# This script represents a conceptual execution of an "in-house script" # for calculating gene-level read counts and performing differential expression analysis. # The actual script name, programming language (e.g., Python, R), and parameters # would be specific to the in-house implementation. # --- Reference Data Setup (Example: Ensembl GRCh38, release 111 GTF) --- # Download the gene annotation file if not already present. # mkdir -p references # cd references # wget -c https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz # gunzip -f Homo_sapiens.GRCh38.111.gtf.gz # cd .. GENE_ANNOTATION="references/Homo_sapiens.GRCh38.111.gtf" # Path to your GTF file # --- Input Data (Example: Aligned BAM files) --- # These are placeholder BAM files that would typically be generated in a preceding alignment step. # Replace with actual paths to your input BAM files. INPUT_BAM_FILES=( "data/sample_treated_rep1.bam" "data/sample_treated_rep2.bam" "data/sample_control_rep1.bam" "data/sample_control_rep2.bam" ) # Convert array to space-separated string for command line INPUT_BAM_STRING="${INPUT_BAM_FILES[*]}" # --- Experimental Design File --- # A design file (e.g., CSV or TSV) is crucial for differential expression analysis, # mapping samples to experimental conditions. # Example content for 'design.csv': # sample_id,condition # sample_treated_rep1,treated # sample_treated_rep2,treated # sample_control_rep1,control # sample_control_rep2,control # # Create a placeholder design file if it doesn't exist # echo "sample_id,condition" > design.csv # echo "sample_treated_rep1,treated" >> design.csv # echo "sample_treated_rep2,treated" >> design.csv # echo "sample_control_rep1,control" >> design.csv # echo "sample_control_rep2,control" >> design.csv DESIGN_FILE="design.csv" # --- Output Files --- OUTPUT_COUNTS_FILE="gene_level_read_counts.tsv" OUTPUT_DE_RESULTS="differentially_expressed_genes.tsv" OUTPUT_LOG="in_house_script.log" # --- Execute the In-House Script --- # This command is a conceptual representation. # The actual script name and parameters would vary based on the in-house implementation. # It is assumed this script handles both gene counting and DE analysis. in_house_gene_quant_and_de_script.py \ --input_bams "${INPUT_BAM_STRING}" \ --gene_annotation "${GENE_ANNOTATION}" \ --design_file "${DESIGN_FILE}" \ --output_counts "${OUTPUT_COUNTS_FILE}" \ --output_de_results "${OUTPUT_DE_RESULTS}" \ --log_file "${OUTPUT_LOG}"
Tools Used
Raw Source Text
The raw data was mapped using STAR. We calculated the gene-level read counts and identified differentially expressed genes by in-house script. Assembly: mm10