GSE202202 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Localized molecular chaperone synthesis maintains neuronal dendrite proteostasis.Nature communications (2024) — PMID 39737952
Dataset
GSE202202RNA-seq in neurites and soma of primary hippocampal neurons isolated from P0 or P1 mice
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0)
Trimmomatic v0.38.0$ Bash example
# Install Trimmomatic (if not already installed) # conda install -c bioconda trimmomatic=0.38.0 # Define input/output files (placeholders, replace with actual file names) INPUT_R1="input_R1.fastq.gz" INPUT_R2="input_R2.fastq.gz" OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz" OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz" OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz" OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz" # Define Trimmomatic JAR path (adjust if needed, e.g., if installed via conda, it might be in $CONDA_PREFIX/share/trimmomatic-0.38.0-0/trimmomatic.jar) TRIMMOMATIC_JAR="/path/to/trimmomatic-0.38.jar" # REPLACE WITH ACTUAL PATH # Define adapter file path (adjust if needed, or download). # A common adapter file is TruSeq3-PE.fa, often found in the Trimmomatic installation directory's 'adapters' folder. ADAPTER_FILE="/path/to/Trimmomatic-0.38/adapters/TruSeq3-PE.fa" # REPLACE WITH ACTUAL PATH OR DOWNLOAD # Execute Trimmomatic for paired-end reads java -jar "${TRIMMOMATIC_JAR}" PE \ -phred33 \ "${INPUT_R1}" "${INPUT_R2}" \ "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \ "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \ ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \ LEADING:20 \ TRAILING:20 \ SLIDINGWINDOW:4:20 \ MINLEN:36 -
2
Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Placeholder for STAR genome index directory # Replace with the actual path to your mm10 STAR index GENOME_DIR="/path/to/STAR_index/mm10" # Placeholder for input FASTQ file(s) # Replace with your actual input FASTQ file(s) READ1="input_R1.fastq.gz" # READ2="input_R2.fastq.gz" # Uncomment if paired-end # Placeholder for output prefix OUTPUT_PREFIX="aligned_reads" # Run STAR alignment STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1}" \ --runThreadN 8 \ --outFileNamePrefix "${OUTPUT_PREFIX}_" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard -
3
Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2)
featureCounts v2.0.1$ Bash example
# Install featureCounts (part of the Subread package) # conda install -c bioconda subread=2.0.1 # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for the input BAM file containing aligned reads ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Placeholder for a GTF annotation file (e.g., from Ensembl GRCh38) OUTPUT_COUNTS="transcript_counts.txt" # Output file for read counts # Run featureCounts to determine the number of reads per transcript (gene-level counts) # -a: Specify the annotation file (GTF/GFF format) # -o: Specify the output file for read counts # -t exon: Specify feature type to count (e.g., 'exon' is commonly used for gene-level counting) # -g gene_id: Specify attribute type in GTF/GFF annotation which contains gene identifiers # Note: While the description states 'per transcript', featureCounts is most commonly used for gene-level quantification. # For strictly isoform-specific quantification, tools like RSEM, Salmon, or Kallisto are typically employed. # This command assumes 'per transcript' refers to gene-level expression counts. featureCounts -a ${ANNOTATION_GTF} -o ${OUTPUT_COUNTS} -t exon -g gene_id ${INPUT_BAM} -
4
Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1)
$ Bash example
# Install DESeq2 (if not already installed) # conda install -c bioconda bioconductor-deseq2 # Create dummy count data (replace with actual featureCounts/HTSeq/Salmon/Kallisto output) # Example: counts.tsv with gene IDs as first column and sample counts in subsequent columns echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv echo -e "geneA\t100\t120\t50\t60" >> counts.tsv echo -e "geneB\t50\t60\t100\t110" >> counts.tsv echo -e "geneC\t200\t210\t180\t190" >> counts.tsv echo -e "geneD\t10\t12\t20\t22" >> counts.tsv # Create dummy sample information (replace with actual experimental design) # Example: sample_info.tsv with sample IDs as first column and experimental conditions in subsequent columns echo -e "sample\tcondition" > sample_info.tsv echo -e "sample1\tcontrol" >> sample_info.tsv echo -e "sample2\tcontrol" >> sample_info.tsv echo -e "sample3\ttreated" >> sample_info.tsv echo -e "sample4\ttreated" >> sample_info.tsv # Create an R script for DESeq2 analysis cat << 'EOF' > run_deseq2.R library(DESeq2) # Load count data # Ensure row names are gene IDs and column names are sample IDs count_data <- read.table("counts.tsv", header = TRUE, row.names = 1) # Load sample information # Ensure row names are sample IDs and column names are experimental factors sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1) # Ensure sample order in colData matches countData columns sample_info <- sample_info[colnames(count_data), , drop = FALSE] # Create DESeqDataSet object # The 'design' formula specifies the experimental design (e.g., ~ condition) dds <- DESeqDataSetFromMatrix(countData = count_data, colData = sample_info, design = ~ condition) # Run DESeq2 analysis dds <- DESeq(dds) # Get results for a specific contrast (e.g., 'treated' vs 'control' for the 'condition' factor) # Replace 'condition', 'treated', 'control' with your actual factor names and levels res <- results(dds, contrast = c("condition", "treated", "control")) # Order results by adjusted p-value res <- res[order(res$padj), ] # Save differential expression results write.csv(as.data.frame(res), file = "deseq2_results.csv") # Save normalized counts norm_counts <- counts(dds, normalized=TRUE) write.csv(as.data.frame(norm_counts), file = "deseq2_normalized_counts.csv") EOF # Execute the R script Rscript run_deseq2.R
Raw Source Text
Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0) Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0) Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2) Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1) Assembly: mm10 Supplementary files format and content: Featurecounts files, reads aligned on mm10 genome