GSE299099 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Structural and mechanistic analysis of covalent ligands targeting the RNA-binding protein NONO.

Cell chemical biology (2026) — PMID 41534524

Dataset

GSE299099

Transcriptome changes in MCF7 cells after treatment with NONO ligands and controls

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation.

    Salmon v1.3.0
    $ Bash example
    salmon index -t hg19.fa i salmon_index -k 31
  2. 2

    Gene level quantification was performed using tximeta [v1.8.4].

    tximeta v1.8.4 GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')"
    # R -e "BiocManager::install('tximeta')"
    # R -e "BiocManager::install('readr')" # For read_tsv
    
    # Create a dummy samples.tsv file for demonstration
    # This file would typically contain sample IDs, condition, and paths to quantification directories
    echo -e "sample\tcondition\tquant_dir" > samples.tsv
    echo -e "sample1\tcontrol\t./salmon_quant_dir1" >> samples.tsv
    echo -e "sample2\ttreated\t./salmon_quant_dir2" >> samples.tsv
    
    # Placeholder for quantification directories (e.g., from Salmon, Kallisto, RSEM)
    # In a real scenario, these would be generated by a prior quantification step.
    # mkdir -p salmon_quant_dir1 salmon_quant_dir2
    # touch salmon_quant_dir1/quant.sf # Placeholder for Salmon output
    # touch salmon_quant_dir2/quant.sf # Placeholder for Salmon output
    
    # Reference dataset: The reference transcriptome and annotation used for the initial quantification
    # (e.g., Salmon, Kallisto) is implicitly used by tximeta to build or retrieve a TxDb object.
    # Example: Human genome assembly (e.g., GRCh38/hg38) and GENCODE annotation (e.g., v45).
    
    # R script to perform gene-level quantification using tximeta
    Rscript -e '
      library(tximeta)
      library(readr)
    
      # Read sample metadata
      coldata <- read_tsv("samples.tsv")
    
      # Define paths to quantification files (e.g., Salmon quant.sf)
      # tximeta expects a "files" column pointing to the quantification output
      coldata$files <- file.path(coldata$quant_dir, "quant.sf")
    
      # Import quantification data with metadata
      # tximeta will automatically detect the reference transcriptome if it was indexed with a known FASTA/GTF
      # and will try to build/load a TxDb object from Bioconductor AnnotationHub.
      se <- tximeta(coldata)
    
      # Summarize to gene level
      # This step requires a TxDb object, which tximeta tries to automatically create/load
      # based on the reference used for quantification.
      gse <- summarizeToGene(se)
    
      # Save results (e.g., gene-level counts and TPMs)
      # For demonstration, we extract counts and TPMs
      counts_matrix <- assays(gse)$counts
      tpm_matrix <- assays(gse)$abundance
    
      write.csv(counts_matrix, "gene_counts.csv")
      write.csv(tpm_matrix, "gene_tpm.csv")
    
      # Optionally, save the summarizedExperiment object for further analysis
      saveRDS(gse, "gene_level_summarized_experiment.rds")
    '
  3. 3

    Differential gene expression was analyzed by DESeq2 [v1.30.1]

    $ Bash example
    # Install DESeq2 (R package) via Bioconda
    # conda install -c bioconda bioconductor-deseq2=1.30.1
    
    # Create a placeholder R script for DESeq2 analysis
    cat << 'EOF' > deseq2_analysis.R
    #!/usr/bin/env Rscript
    
    # Load DESeq2 library
    library(DESeq2)
    
    # --- Placeholder for input files ---
    # Replace with actual paths to your count matrix and sample information
    # The count matrix should have genes/features as rows and samples as columns.
    # The sample information file should have samples as rows and metadata (e.g., 'condition') as columns.
    count_matrix_file <- "counts.csv"
    sample_info_file <- "sample_info.csv"
    output_results_file <- "deseq2_results.csv"
    
    # --- Load data ---
    # Assuming counts are raw counts (integers) and samples are columns, genes are rows
    # Adjust 'row.names' and 'sep' as needed for your file format
    # For example, if your count matrix is tab-separated and has gene IDs in the first column:
    # count_data <- read.delim(count_matrix_file, row.names = 1, sep = "\t")
    count_data <- read.csv(count_matrix_file, row.names = 1)
    
    # Load sample information
    # For example, if your sample info is tab-separated and has sample IDs in the first column:
    # sample_info <- read.delim(sample_info_file, row.names = 1, sep = "\t")
    sample_info <- read.csv(sample_info_file, row.names = 1)
    
    # Ensure sample names match between count data and sample info
    # And ensure they are in the same order
    sample_info <- sample_info[colnames(count_data), , drop = FALSE]
    
    # --- Create DESeqDataSet object ---
    # Design formula: ~ condition is a common example.
    # Replace 'condition' with the actual column name in your sample_info that defines your experimental groups.
    # Ensure 'condition' is a factor.
    sample_info$condition <- factor(sample_info$condition)
    dds <- DESeqDataSetFromMatrix(countData = round(count_data), # DESeq2 expects integer counts
                                  colData = sample_info,
                                  design = ~ condition)
    
    # --- Run DESeq2 analysis ---
    message("Running DESeq2 analysis...")
    dds <- DESeq(dds)
    message("DESeq2 analysis complete.")
    
    # --- Extract results ---
    # Replace 'condition_groupA_vs_groupB' with your actual contrast.
    # For example, if 'condition' has levels 'treated' and 'control', you might use:
    # res <- results(dds, contrast=c("condition", "treated", "control"))
    # If you just want the default comparison (last level vs first level of the factor):
    res <- results(dds)
    
    # Order results by adjusted p-value
    res_ordered <- res[order(res$padj),]
    
    # --- Save results ---
    write.csv(as.data.frame(res_ordered), file = output_results_file)
    
    message(paste("DESeq2 results saved to:", output_results_file))
    EOF
    
    # Execute the R script
    Rscript deseq2_analysis.R

Tools Used

Raw Source Text
Transcript abundance was quantified using Salmon [v1.3.0] with GENCODE v37 annotation.
Gene level quantification was performed using tximeta [v1.8.4].
Differential gene expression was analyzed by DESeq2 [v1.30.1]
Assembly: HG19
Supplementary files format and content: Feature counts for differential expression
← Back to Analysis