GSE237099 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Zfp697 is an RNA-binding protein that regulates skeletal muscle inflammation and remodeling.

Proceedings of the National Academy of Sciences of the United States of America (2024) — PMID 39141348

Dataset

Analysis of skeletal muscle gene expression upon mouse hindlimb unloading and reloading

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to GRCm38 whole genome using STAR 2.7.3a with extra parameters --outFilterScoreMinOverLread 0 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0 --outFilterMismatchNmax 33 --seedSearchStartLmax 12 --alignSJoverhangMin 8

STAR v2.7.3a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star

# Define variables for clarity
GENOME_DIR="/path/to/GRCm38/STAR_index" # Placeholder for GRCm38 STAR index
READS_R1="input_reads_R1.fastq.gz" # Placeholder for input forward reads
READS_R2="input_reads_R2.fastq.gz" # Placeholder for input reverse reads (assuming paired-end)
OUTPUT_PREFIX="aligned_reads" # Prefix for output files
THREADS=8 # Number of threads to use

# Run STAR alignment
STAR --genomeDir "${GENOME_DIR}" \
     --readFilesIn "${READS_R1}" "${READS_R2}" \
     --runThreadN "${THREADS}" \
     --outFileNamePrefix "${OUTPUT_PREFIX}" \
     --outFilterScoreMinOverLread 0 \
     --outFilterMatchNmin 0 \
     --outFilterMatchNminOverLread 0 \
     --outFilterMismatchNmax 33 \
     --seedSearchStartLmax 12 \
     --alignSJoverhangMin 8 \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMunmapped Within \
     --outSAMattributes All \
     --readFilesCommand zcat

View on GitHub

Reads aligning to gene exons (Mus_musculus.GRCm38.102.gtf) were counted using featureCounts v2.0.3 program from Subread package, summarized at gene level with extra parameters -s 0 --countReadPairs.

featureCounts v2.0.3 GitHub

$ Bash example

# Install Subread package (which includes featureCounts)
# conda install -c bioconda subread

# Define input and output files, and reference annotation
INPUT_BAM="input.bam" # Placeholder for your aligned BAM file(s)
GTF_FILE="/path/to/Mus_musculus.GRCm38.102.gtf" # Placeholder for the Mus_musculus GRCm38.102 GTF file
OUTPUT_COUNTS="gene_counts.txt"

# Run featureCounts
featureCounts -a "${GTF_FILE}" -o "${OUTPUT_COUNTS}" -s 0 --countReadPairs "${INPUT_BAM}"

View on GitHub

Differential gene expression at FDR level 0.05 was obtained using DESeq2 R package, genes with total raw counts less then 1 across all samples were excluded from analysis

DESeq2 v1.40.2 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# For Ubuntu/Debian:
# sudo apt-get update
# sudo apt-get install r-base
# R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install('DESeq2')"

# Create dummy input files for demonstration purposes.
# In a real scenario, replace these with your actual count matrix and sample metadata.

# counts_matrix.csv: A CSV file where rows are genes and columns are samples, containing raw counts.
# The first column should be gene IDs.
echo "gene_id,sample1,sample2,sample3,sample4" > counts_matrix.csv
echo "geneA,100,120,110,130" >> counts_matrix.csv
echo "geneB,50,60,55,65" >> counts_matrix.csv
echo "geneC,0,0,0,0" >> counts_matrix.csv # This gene will be filtered out as its total raw count is 0
echo "geneD,200,210,190,220" >> counts_matrix.csv
echo "geneE,5,0,3,2" >> counts_matrix.csv # This gene will NOT be filtered (sum = 10 > 0)

# sample_metadata.csv: A CSV file where rows are samples and columns are experimental factors.
# The first column should be sample names, matching the column names in counts_matrix.csv.
echo "sample,condition" > sample_metadata.csv
echo "sample1,control" >> sample_metadata.csv
echo "sample2,treatment" >> sample_metadata.csv
echo "sample3,control" >> sample_metadata.csv
echo "sample4,treatment" >> sample_metadata.csv

# R script for DESeq2 analysis
cat << 'EOF' > run_deseq2.R
library(DESeq2)

# Load count data
# Assuming the first column is gene_id and subsequent columns are sample counts
count_data <- read.csv("counts_matrix.csv", row.names = 1, check.names = FALSE)

# Load sample metadata
# Assuming the first column is sample name and subsequent columns are experimental factors
coldata <- read.csv("sample_metadata.csv", row.names = 1)

# Ensure sample names match and are in the same order
coldata <- coldata[colnames(count_data), , drop = FALSE]
stopifnot(all(rownames(coldata) == colnames(count_data)))

# Filter out genes with total raw counts less than 1 across all samples (i.e., sum of counts is 0)
# This implements the description: "genes with total raw counts less then 1 across all samples were excluded from analysis"
keep <- rowSums(count_data) >= 1
count_data_filtered <- count_data[keep, ]

# Create DESeqDataSet object
# The design formula should reflect the experimental design, e.g., ~ condition
dds <- DESeqDataSetFromMatrix(countData = count_data_filtered,
                              colData = coldata,
                              design = ~ condition)

# Run DESeq2 analysis
dds <- DESeq(dds)

# Get results with FDR level 0.05 (alpha = 0.05)
# This implements the description: "Differential gene expression at FDR level 0.05 was obtained"
res <- results(dds, alpha = 0.05)

# Order results by adjusted p-value
res_ordered <- res[order(res$padj), ]

# Save results to a CSV file
write.csv(as.data.frame(res_ordered), "deseq2_results_fdr0.05.csv")

message("DESeq2 analysis complete. Results saved to deseq2_results_fdr0.05.csv")
EOF

# Execute the R script
Rscript run_deseq2.R

View on GitHub

Tools Used

STAR DESeq2

Raw Source Text

Sequenced reads were trimmed for adaptor sequence, and masked for low-complexity or low-quality sequence, then mapped to GRCm38 whole genome using STAR 2.7.3a with extra parameters --outFilterScoreMinOverLread 0   --outFilterMatchNmin 0   --outFilterMatchNminOverLread 0   --outFilterMismatchNmax 33   --seedSearchStartLmax 12   --alignSJoverhangMin 8
Reads aligning to gene exons (Mus_musculus.GRCm38.102.gtf) were counted using featureCounts v2.0.3 program from Subread package, summarized at gene level with extra parameters -s 0 --countReadPairs.
Differential gene expression at FDR level 0.05 was obtained using DESeq2 R package, genes with total raw counts less then 1 across all samples were excluded from analysis
Assembly: GRCm38.p6
Supplementary files format and content: tab-delimited text file includes raw read counts for all samples
Supplementary files format and content: tab-delimited text files includes differetial gene expression analysis output from DESeq2

← Back to Analysis