GSE299099 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Structural and mechanistic analysis of covalent ligands targeting the RNA-binding protein NONO.

Cell chemical biology (2026) — PMID 41534524

Dataset

Transcriptome changes in MCF7 cells after treatment with NONO ligands and controls

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1
Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation.

Salmon v1.3.0
$ Bash example
```
salmon index -t hg19.fa i salmon_index -k 31
```

Gene level quantification was performed using tximeta [v1.8.4].

tximeta v1.8.4 GitHub

$ Bash example

# Install R and Bioconductor if not already present
# R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')"
# R -e "BiocManager::install('tximeta')"
# R -e "BiocManager::install('readr')" # For read_tsv

# Create a dummy samples.tsv file for demonstration
# This file would typically contain sample IDs, condition, and paths to quantification directories
echo -e "sample\tcondition\tquant_dir" > samples.tsv
echo -e "sample1\tcontrol\t./salmon_quant_dir1" >> samples.tsv
echo -e "sample2\ttreated\t./salmon_quant_dir2" >> samples.tsv

# Placeholder for quantification directories (e.g., from Salmon, Kallisto, RSEM)
# In a real scenario, these would be generated by a prior quantification step.
# mkdir -p salmon_quant_dir1 salmon_quant_dir2
# touch salmon_quant_dir1/quant.sf # Placeholder for Salmon output
# touch salmon_quant_dir2/quant.sf # Placeholder for Salmon output

# Reference dataset: The reference transcriptome and annotation used for the initial quantification
# (e.g., Salmon, Kallisto) is implicitly used by tximeta to build or retrieve a TxDb object.
# Example: Human genome assembly (e.g., GRCh38/hg38) and GENCODE annotation (e.g., v45).

# R script to perform gene-level quantification using tximeta
Rscript -e '
  library(tximeta)
  library(readr)

  # Read sample metadata
  coldata <- read_tsv("samples.tsv")

  # Define paths to quantification files (e.g., Salmon quant.sf)
  # tximeta expects a "files" column pointing to the quantification output
  coldata$files <- file.path(coldata$quant_dir, "quant.sf")

  # Import quantification data with metadata
  # tximeta will automatically detect the reference transcriptome if it was indexed with a known FASTA/GTF
  # and will try to build/load a TxDb object from Bioconductor AnnotationHub.
  se <- tximeta(coldata)

  # Summarize to gene level
  # This step requires a TxDb object, which tximeta tries to automatically create/load
  # based on the reference used for quantification.
  gse <- summarizeToGene(se)

  # Save results (e.g., gene-level counts and TPMs)
  # For demonstration, we extract counts and TPMs
  counts_matrix <- assays(gse)$counts
  tpm_matrix <- assays(gse)$abundance

  write.csv(counts_matrix, "gene_counts.csv")
  write.csv(tpm_matrix, "gene_tpm.csv")

  # Optionally, save the summarizedExperiment object for further analysis
  saveRDS(gse, "gene_level_summarized_experiment.rds")
'

View on GitHub

Differential gene expression was analyzed by DESeq2 [v1.30.1]

DESeq2 v1.30.1 GitHub

$ Bash example

# Install DESeq2 (R package) via Bioconda
# conda install -c bioconda bioconductor-deseq2=1.30.1

# Create a placeholder R script for DESeq2 analysis
cat << 'EOF' > deseq2_analysis.R
#!/usr/bin/env Rscript

# Load DESeq2 library
library(DESeq2)

# --- Placeholder for input files ---
# Replace with actual paths to your count matrix and sample information
# The count matrix should have genes/features as rows and samples as columns.
# The sample information file should have samples as rows and metadata (e.g., 'condition') as columns.
count_matrix_file <- "counts.csv"
sample_info_file <- "sample_info.csv"
output_results_file <- "deseq2_results.csv"

# --- Load data ---
# Assuming counts are raw counts (integers) and samples are columns, genes are rows
# Adjust 'row.names' and 'sep' as needed for your file format
# For example, if your count matrix is tab-separated and has gene IDs in the first column:
# count_data <- read.delim(count_matrix_file, row.names = 1, sep = "\t")
count_data <- read.csv(count_matrix_file, row.names = 1)

# Load sample information
# For example, if your sample info is tab-separated and has sample IDs in the first column:
# sample_info <- read.delim(sample_info_file, row.names = 1, sep = "\t")
sample_info <- read.csv(sample_info_file, row.names = 1)

# Ensure sample names match between count data and sample info
# And ensure they are in the same order
sample_info <- sample_info[colnames(count_data), , drop = FALSE]

# --- Create DESeqDataSet object ---
# Design formula: ~ condition is a common example.
# Replace 'condition' with the actual column name in your sample_info that defines your experimental groups.
# Ensure 'condition' is a factor.
sample_info$condition <- factor(sample_info$condition)
dds <- DESeqDataSetFromMatrix(countData = round(count_data), # DESeq2 expects integer counts
                              colData = sample_info,
                              design = ~ condition)

# --- Run DESeq2 analysis ---
message("Running DESeq2 analysis...")
dds <- DESeq(dds)
message("DESeq2 analysis complete.")

# --- Extract results ---
# Replace 'condition_groupA_vs_groupB' with your actual contrast.
# For example, if 'condition' has levels 'treated' and 'control', you might use:
# res <- results(dds, contrast=c("condition", "treated", "control"))
# If you just want the default comparison (last level vs first level of the factor):
res <- results(dds)

# Order results by adjusted p-value
res_ordered <- res[order(res$padj),]

# --- Save results ---
write.csv(as.data.frame(res_ordered), file = output_results_file)

message(paste("DESeq2 results saved to:", output_results_file))
EOF

# Execute the R script
Rscript deseq2_analysis.R

View on GitHub

Tools Used

DESeq2

Raw Source Text

Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation.
Gene level quantification was performed using tximeta [v1.8.4].
Differential gene expression was analyzed by DESeq2 [v1.30.1]
Assembly: HG19
Supplementary files format and content: Feature counts for differential expression

← Back to Analysis