GSE220459 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size.Neuron (2024) — PMID 38697111
Dataset
GSE220459Epistatic interactions between NMD and TRP53 control progenitor cell maintenance and brain size (RNA-seq NPC)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
The raw data was mapped using STAR.
$ Bash example
# Install STAR (example using conda) # conda create -n star_env star -y # conda activate star_env # Build STAR genome index (run once per genome, e.g., GRCh38) # mkdir -p /path/to/genome_index/GRCh38 # STAR --runThreadN 8 \ # --runMode genomeGenerate \ # --genomeDir /path/to/genome_index/GRCh38 \ # --genomeFastaFiles /path/to/fasta/GRCh38.primary_assembly.genome.fa \ # --sjdbGTFfile /path/to/gtf/gencode.v38.annotation.gtf \ # --sjdbOverhang 100 # Recommended: (read_length - 1) # Map raw data using STAR (example for paired-end reads) STAR --runThreadN 8 \ --genomeDir /path/to/genome_index/GRCh38 \ --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz \ --readFilesCommand zcat \ --outFileNamePrefix sample_ \ --outSAMtype BAM SortedByCoordinate \ --outBAMcompression 6 \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 3 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --quantMode GeneCounts # Optional: for gene-level quantification -
2
We calculated the gene-level read counts and identified differentially expressed genes by in-house script.
In-house script vN/A$ Bash example
# This script calculates gene-level read counts and identifies differentially expressed genes. # The specific implementation details are within the "in-house script". # # Pre-requisite: Gene-level read counts (e.g., from featureCounts, HTSeq-count, or RSEM). # Example for generating counts (assuming aligned BAM files in 'aligned_bams/'): # # conda install -c bioconda subread # # featureCounts -a gencode.v44.annotation.gtf -o gene_counts.tsv -F GTF -t exon -g gene_id aligned_bams/*.bam # # Input for DE analysis: gene_counts.tsv (gene-level read counts), sample_metadata.tsv (experimental design) # Output: diff_exp_results.tsv (table of differentially expressed genes) # # Reference datasets (placeholders): # - Gene annotation: gencode.v44.annotation.gtf (GENCODE human release 44, GRCh38) # Execute the in-house script for differential expression analysis. # The actual command, script name, and parameters will depend on the specific in-house implementation. # This is a placeholder command demonstrating typical inputs and outputs. Rscript run_inhouse_de_script.R \ --counts_file gene_counts.tsv \ --metadata_file sample_metadata.tsv \ --output_file diff_exp_results.tsv \ --design_formula "~ condition + batch" \ --min_reads_per_gene 10 \ --fdr_threshold 0.05 \ --log2fc_threshold 1.0
Tools Used
Raw Source Text
The raw data was mapped using STAR. We calculated the gene-level read counts and identified differentially expressed genes by in-house script. Assembly: mm10