GSE299099 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
Structural and mechanistic analysis of covalent ligands targeting the RNA-binding protein NONO.Cell chemical biology (2026) — PMID 41534524
Dataset
GSE299099Transcriptome changes in MCF7 cells after treatment with NONO ligands and controls
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation.
Salmon v1.3.0$ Bash example
salmon index -t hg19.fa i salmon_index -k 31
-
2
Gene level quantification was performed using tximeta [v1.8.4].
$ Bash example
# Install R and Bioconductor if not already present # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')" # R -e "BiocManager::install('tximeta')" # R -e "BiocManager::install('readr')" # For read_tsv # Create a dummy samples.tsv file for demonstration # This file would typically contain sample IDs, condition, and paths to quantification directories echo -e "sample\tcondition\tquant_dir" > samples.tsv echo -e "sample1\tcontrol\t./salmon_quant_dir1" >> samples.tsv echo -e "sample2\ttreated\t./salmon_quant_dir2" >> samples.tsv # Placeholder for quantification directories (e.g., from Salmon, Kallisto, RSEM) # In a real scenario, these would be generated by a prior quantification step. # mkdir -p salmon_quant_dir1 salmon_quant_dir2 # touch salmon_quant_dir1/quant.sf # Placeholder for Salmon output # touch salmon_quant_dir2/quant.sf # Placeholder for Salmon output # Reference dataset: The reference transcriptome and annotation used for the initial quantification # (e.g., Salmon, Kallisto) is implicitly used by tximeta to build or retrieve a TxDb object. # Example: Human genome assembly (e.g., GRCh38/hg38) and GENCODE annotation (e.g., v45). # R script to perform gene-level quantification using tximeta Rscript -e ' library(tximeta) library(readr) # Read sample metadata coldata <- read_tsv("samples.tsv") # Define paths to quantification files (e.g., Salmon quant.sf) # tximeta expects a "files" column pointing to the quantification output coldata$files <- file.path(coldata$quant_dir, "quant.sf") # Import quantification data with metadata # tximeta will automatically detect the reference transcriptome if it was indexed with a known FASTA/GTF # and will try to build/load a TxDb object from Bioconductor AnnotationHub. se <- tximeta(coldata) # Summarize to gene level # This step requires a TxDb object, which tximeta tries to automatically create/load # based on the reference used for quantification. gse <- summarizeToGene(se) # Save results (e.g., gene-level counts and TPMs) # For demonstration, we extract counts and TPMs counts_matrix <- assays(gse)$counts tpm_matrix <- assays(gse)$abundance write.csv(counts_matrix, "gene_counts.csv") write.csv(tpm_matrix, "gene_tpm.csv") # Optionally, save the summarizedExperiment object for further analysis saveRDS(gse, "gene_level_summarized_experiment.rds") ' -
3
Differential gene expression was analyzed by DESeq2 [v1.30.1]
$ Bash example
# Install DESeq2 (R package) via Bioconda # conda install -c bioconda bioconductor-deseq2=1.30.1 # Create a placeholder R script for DESeq2 analysis cat << 'EOF' > deseq2_analysis.R #!/usr/bin/env Rscript # Load DESeq2 library library(DESeq2) # --- Placeholder for input files --- # Replace with actual paths to your count matrix and sample information # The count matrix should have genes/features as rows and samples as columns. # The sample information file should have samples as rows and metadata (e.g., 'condition') as columns. count_matrix_file <- "counts.csv" sample_info_file <- "sample_info.csv" output_results_file <- "deseq2_results.csv" # --- Load data --- # Assuming counts are raw counts (integers) and samples are columns, genes are rows # Adjust 'row.names' and 'sep' as needed for your file format # For example, if your count matrix is tab-separated and has gene IDs in the first column: # count_data <- read.delim(count_matrix_file, row.names = 1, sep = "\t") count_data <- read.csv(count_matrix_file, row.names = 1) # Load sample information # For example, if your sample info is tab-separated and has sample IDs in the first column: # sample_info <- read.delim(sample_info_file, row.names = 1, sep = "\t") sample_info <- read.csv(sample_info_file, row.names = 1) # Ensure sample names match between count data and sample info # And ensure they are in the same order sample_info <- sample_info[colnames(count_data), , drop = FALSE] # --- Create DESeqDataSet object --- # Design formula: ~ condition is a common example. # Replace 'condition' with the actual column name in your sample_info that defines your experimental groups. # Ensure 'condition' is a factor. sample_info$condition <- factor(sample_info$condition) dds <- DESeqDataSetFromMatrix(countData = round(count_data), # DESeq2 expects integer counts colData = sample_info, design = ~ condition) # --- Run DESeq2 analysis --- message("Running DESeq2 analysis...") dds <- DESeq(dds) message("DESeq2 analysis complete.") # --- Extract results --- # Replace 'condition_groupA_vs_groupB' with your actual contrast. # For example, if 'condition' has levels 'treated' and 'control', you might use: # res <- results(dds, contrast=c("condition", "treated", "control")) # If you just want the default comparison (last level vs first level of the factor): res <- results(dds) # Order results by adjusted p-value res_ordered <- res[order(res$padj),] # --- Save results --- write.csv(as.data.frame(res_ordered), file = output_results_file) message(paste("DESeq2 results saved to:", output_results_file)) EOF # Execute the R script Rscript deseq2_analysis.R
Tools Used
Raw Source Text
Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation. Gene level quantification was performed using tximeta [v1.8.4]. Differential gene expression was analyzed by DESeq2 [v1.30.1] Assembly: HG19 Supplementary files format and content: Feature counts for differential expression