GSE202202 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Localized molecular chaperone synthesis maintains neuronal dendrite proteostasis.

Nature communications (2024) — PMID 39737952

Dataset

GSE202202

RNA-seq in neurites and soma of primary hippocampal neurons isolated from P0 or P1 mice

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0)

Trimmomatic v0.38.0

$ Bash example

# Install Trimmomatic (if not already installed)
# conda install -c bioconda trimmomatic=0.38.0

# Define input/output files (placeholders, replace with actual file names)
INPUT_R1="input_R1.fastq.gz"
INPUT_R2="input_R2.fastq.gz"
OUTPUT_R1_PAIRED="output_R1_paired.fastq.gz"
OUTPUT_R1_UNPAIRED="output_R1_unpaired.fastq.gz"
OUTPUT_R2_PAIRED="output_R2_paired.fastq.gz"
OUTPUT_R2_UNPAIRED="output_R2_unpaired.fastq.gz"

# Define Trimmomatic JAR path (adjust if needed, e.g., if installed via conda, it might be in $CONDA_PREFIX/share/trimmomatic-0.38.0-0/trimmomatic.jar)
TRIMMOMATIC_JAR="/path/to/trimmomatic-0.38.jar" # REPLACE WITH ACTUAL PATH

# Define adapter file path (adjust if needed, or download). 
# A common adapter file is TruSeq3-PE.fa, often found in the Trimmomatic installation directory's 'adapters' folder.
ADAPTER_FILE="/path/to/Trimmomatic-0.38/adapters/TruSeq3-PE.fa" # REPLACE WITH ACTUAL PATH OR DOWNLOAD

# Execute Trimmomatic for paired-end reads
java -jar "${TRIMMOMATIC_JAR}" PE \
    -phred33 \
    "${INPUT_R1}" "${INPUT_R2}" \
    "${OUTPUT_R1_PAIRED}" "${OUTPUT_R1_UNPAIRED}" \
    "${OUTPUT_R2_PAIRED}" "${OUTPUT_R2_UNPAIRED}" \
    ILLUMINACLIP:"${ADAPTER_FILE}":2:30:10 \
    LEADING:20 \
    TRAILING:20 \
    SLIDINGWINDOW:4:20 \
    MINLEN:36

Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0)

STAR v2.7.8a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Placeholder for STAR genome index directory
# Replace with the actual path to your mm10 STAR index
GENOME_DIR="/path/to/STAR_index/mm10"

# Placeholder for input FASTQ file(s)
# Replace with your actual input FASTQ file(s)
READ1="input_R1.fastq.gz"
# READ2="input_R2.fastq.gz" # Uncomment if paired-end

# Placeholder for output prefix
OUTPUT_PREFIX="aligned_reads"

# Run STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READ1}" \
     --runThreadN 8 \
     --outFileNamePrefix "${OUTPUT_PREFIX}_" \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard

View on GitHub

Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2)

featureCounts v2.0.1

$ Bash example

# Install featureCounts (part of the Subread package)
# conda install -c bioconda subread=2.0.1

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for the input BAM file containing aligned reads
ANNOTATION_GTF="Homo_sapiens.GRCh38.109.gtf" # Placeholder for a GTF annotation file (e.g., from Ensembl GRCh38)
OUTPUT_COUNTS="transcript_counts.txt" # Output file for read counts

# Run featureCounts to determine the number of reads per transcript (gene-level counts)
# -a: Specify the annotation file (GTF/GFF format)
# -o: Specify the output file for read counts
# -t exon: Specify feature type to count (e.g., 'exon' is commonly used for gene-level counting)
# -g gene_id: Specify attribute type in GTF/GFF annotation which contains gene identifiers
# Note: While the description states 'per transcript', featureCounts is most commonly used for gene-level quantification.
#       For strictly isoform-specific quantification, tools like RSEM, Salmon, or Kallisto are typically employed.
#       This command assumes 'per transcript' refers to gene-level expression counts.
featureCounts -a ${ANNOTATION_GTF} -o ${OUTPUT_COUNTS} -t exon -g gene_id ${INPUT_BAM}

Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1)

DESeq2 v2.11.40 GitHub

$ Bash example

# Install DESeq2 (if not already installed)
# conda install -c bioconda bioconductor-deseq2

# Create dummy count data (replace with actual featureCounts/HTSeq/Salmon/Kallisto output)
# Example: counts.tsv with gene IDs as first column and sample counts in subsequent columns
echo -e "gene\tsample1\tsample2\tsample3\tsample4" > counts.tsv
echo -e "geneA\t100\t120\t50\t60" >> counts.tsv
echo -e "geneB\t50\t60\t100\t110" >> counts.tsv
echo -e "geneC\t200\t210\t180\t190" >> counts.tsv
echo -e "geneD\t10\t12\t20\t22" >> counts.tsv

# Create dummy sample information (replace with actual experimental design)
# Example: sample_info.tsv with sample IDs as first column and experimental conditions in subsequent columns
echo -e "sample\tcondition" > sample_info.tsv
echo -e "sample1\tcontrol" >> sample_info.tsv
echo -e "sample2\tcontrol" >> sample_info.tsv
echo -e "sample3\ttreated" >> sample_info.tsv
echo -e "sample4\ttreated" >> sample_info.tsv

# Create an R script for DESeq2 analysis
cat << 'EOF' > run_deseq2.R
library(DESeq2)

# Load count data
# Ensure row names are gene IDs and column names are sample IDs
count_data <- read.table("counts.tsv", header = TRUE, row.names = 1)

# Load sample information
# Ensure row names are sample IDs and column names are experimental factors
sample_info <- read.table("sample_info.tsv", header = TRUE, row.names = 1)

# Ensure sample order in colData matches countData columns
sample_info <- sample_info[colnames(count_data), , drop = FALSE]

# Create DESeqDataSet object
# The 'design' formula specifies the experimental design (e.g., ~ condition)
dds <- DESeqDataSetFromMatrix(countData = count_data,
                              colData = sample_info,
                              design = ~ condition)

# Run DESeq2 analysis
dds <- DESeq(dds)

# Get results for a specific contrast (e.g., 'treated' vs 'control' for the 'condition' factor)
# Replace 'condition', 'treated', 'control' with your actual factor names and levels
res <- results(dds, contrast = c("condition", "treated", "control"))

# Order results by adjusted p-value
res <- res[order(res$padj), ]

# Save differential expression results
write.csv(as.data.frame(res), file = "deseq2_results.csv")

# Save normalized counts
norm_counts <- counts(dds, normalized=TRUE)
write.csv(as.data.frame(norm_counts), file = "deseq2_normalized_counts.csv")

EOF

# Execute the R script
Rscript run_deseq2.R

View on GitHub

Tools Used

STAR DESeq2

Raw Source Text

Adaptors and sequence with a quality below 20 were removed using trimmomatic (galaxy version 0.38.0)
Reads were aligned on mm10 genome using RNA star (galaxy version 2.7.8a+galaxy0)
Number of reads per transcript was determined using FeatureCounts (galaxy version 2.0.1+galaxy2)
Differential gene expression was determined using DESeq2 (galaxy version 2.11.40.7+galaxy1)
Assembly: mm10
Supplementary files format and content: Featurecounts files, reads aligned on mm10 genome

← Back to Analysis