GSE202202 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Localized molecular chaperone synthesis maintains neuronal dendrite proteostasis.

Nature communications (2024) — PMID 39737952

Dataset

GSE202202

RNA-seq in neurites and soma of primary hippocampal neurons isolated from P0 or P1 mice

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0)

    Trimmomatic v0.38.0
    $ Bash example
    # Install Trimmomatic (if not already installed)
    # conda install -c bioconda trimmomatic=0.38.0
    
    # Define input/output files (placeholders, replace with actual file names)
    INPUT_R1="input_R1.fastq.gz"
    INPUT_R2="input_R2.fastq.gz"
    OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz"
    OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz"
    OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz"
    OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz"
    
    # Define Trimmomatic JAR path (adjust if needed, e.g., if installed via conda, it might be in $CONDA_PREFIX/share/trimmomatic-0.38.0-0/trimmomatic.jar)
    TRIMMOMATIC_JAR="/path/to/trimmomatic-0.38.jar" # REPLACE WITH ACTUAL PATH
    
    # Define adapter file path (adjust if needed, or download). 
    # A common adapter file is TruSeq3-PE.fa, often found in the Trimmomatic installation directory's 'adapters' folder.
    ADAPTER_FILE="/path/to/Trimmomatic-0.38/adapters/TruSeq3-PE.fa" # REPLACE WITH ACTUAL PATH OR DOWNLOAD
    
    # Execute Trimmomatic for paired-end reads
    java -jar "${TRIMMOMATIC_JAR}" PE \
        -phred33 \
        "${INPUT_R1}" "${INPUT_R2}" \
        "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
        "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
        ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \
        LEADING:20 \
        TRAILING:20 \
        SLIDINGWINDOW:4:20 \
        MINLEN:36
  2. 2

    Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0)

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Placeholder for STAR genome index directory
    # Replace with the actual path to your mm10 STAR index
    GENOME_DIR="/path/to/STAR_index/mm10"
    
    # Placeholder for input FASTQ file(s)
    # Replace with your actual input FASTQ file(s)
    READ1="input_R1.fastq.gz"
    # READ2="input_R2.fastq.gz" # Uncomment if paired-end
    
    # Placeholder for output prefix
    OUTPUT_PREFIX="aligned_reads"
    
    # Run STAR alignment
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READ1}" \
         --runThreadN 8 \
         --outFileNamePrefix "${OUTPUT_PREFIX}_" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard
  3. 3

    Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2)

    featureCounts v2.0.1
    $ Bash example
    # Install featureCounts (part of the Subread package)
    # conda install -c bioconda subread=2.0.1
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for the input BAM file containing aligned reads
    ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Placeholder for a GTF annotation file (e.g., from Ensembl GRCh38)
    OUTPUT_COUNTS="transcript_counts.txt" # Output file for read counts
    
    # Run featureCounts to determine the number of reads per transcript (gene-level counts)
    # -a: Specify the annotation file (GTF/GFF format)
    # -o: Specify the output file for read counts
    # -t exon: Specify feature type to count (e.g., 'exon' is commonly used for gene-level counting)
    # -g gene_id: Specify attribute type in GTF/GFF annotation which contains gene identifiers
    # Note: While the description states 'per transcript', featureCounts is most commonly used for gene-level quantification.
    #       For strictly isoform-specific quantification, tools like RSEM, Salmon, or Kallisto are typically employed.
    #       This command assumes 'per transcript' refers to gene-level expression counts.
    featureCounts -a ${ANNOTATION_GTF} -o ${OUTPUT_COUNTS} -t exon -g gene_id ${INPUT_BAM}
  4. 4

    Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1)

    $ Bash example
    # Install DESeq2 (if not already installed)
    # conda install -c bioconda bioconductor-deseq2
    
    # Create dummy count data (replace with actual featureCounts/HTSeq/Salmon/Kallisto output)
    # Example: counts.tsv with gene IDs as first column and sample counts in subsequent columns
    echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv
    echo -e "geneA\t100\t120\t50\t60" >> counts.tsv
    echo -e "geneB\t50\t60\t100\t110" >> counts.tsv
    echo -e "geneC\t200\t210\t180\t190" >> counts.tsv
    echo -e "geneD\t10\t12\t20\t22" >> counts.tsv
    
    # Create dummy sample information (replace with actual experimental design)
    # Example: sample_info.tsv with sample IDs as first column and experimental conditions in subsequent columns
    echo -e "sample\tcondition" > sample_info.tsv
    echo -e "sample1\tcontrol" >> sample_info.tsv
    echo -e "sample2\tcontrol" >> sample_info.tsv
    echo -e "sample3\ttreated" >> sample_info.tsv
    echo -e "sample4\ttreated" >> sample_info.tsv
    
    # Create an R script for DESeq2 analysis
    cat << 'EOF' > run_deseq2.R
    library(DESeq2)
    
    # Load count data
    # Ensure row names are gene IDs and column names are sample IDs
    count_data <- read.table("counts.tsv", header = TRUE, row.names = 1)
    
    # Load sample information
    # Ensure row names are sample IDs and column names are experimental factors
    sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1)
    
    # Ensure sample order in colData matches countData columns
    sample_info <- sample_info[colnames(count_data), , drop = FALSE]
    
    # Create DESeqDataSet object
    # The 'design' formula specifies the experimental design (e.g., ~ condition)
    dds <- DESeqDataSetFromMatrix(countData = count_data,
                                  colData = sample_info,
                                  design = ~ condition)
    
    # Run DESeq2 analysis
    dds <- DESeq(dds)
    
    # Get results for a specific contrast (e.g., 'treated' vs 'control' for the 'condition' factor)
    # Replace 'condition', 'treated', 'control' with your actual factor names and levels
    res <- results(dds, contrast = c("condition", "treated", "control"))
    
    # Order results by adjusted p-value
    res <- res[order(res$padj), ]
    
    # Save differential expression results
    write.csv(as.data.frame(res), file = "deseq2_results.csv")
    
    # Save normalized counts
    norm_counts <- counts(dds, normalized=TRUE)
    write.csv(as.data.frame(norm_counts), file = "deseq2_normalized_counts.csv")
    
    EOF
    
    # Execute the R script
    Rscript run_deseq2.R

Tools Used

Raw Source Text
Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0)
Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0)
Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2)
Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1)
Assembly: mm10
Supplementary files format and content: Featurecounts files, reads aligned on mm10 genome
← Back to Analysis