GSE14555 Processing Pipeline
GSE
code_examples
2 steps
Publication
Zmat3 Is a Key Splicing Regulator in the p53 Tumor Suppression Program.Molecular cell (2020) — PMID 33157015
Dataset
GSE14555Divergent Transcriptomic Responses to Aryl Hydrocarbon Receptor Agonists Between Rat and Human Primary Hepatocytes
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
All .CEL files (within each species) were pre-processed using the default settings of the justGCRMA function of gcrma package version 2.8.0 (Wu et al., 2004) as implemented in R.
$ Bash example
# Install Bioconductor if not already installed # if (!requireNamespace("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # # # Install gcrma package version 2.8.0. # # Note: gcrma 2.8.0 is a very old version (released with Bioconductor 2.0 / R 2.5). # # Installing this specific version on a modern R environment might be challenging # # and may require using an older Bioconductor release or specific archival methods. # # For a modern setup, BiocManager::install("gcrma") would install the latest version. # # Example for installing a specific old Bioconductor package version (requires specific Bioconductor version): # # BiocManager::install("gcrma", version = "2.0") # This would attempt to install gcrma from Bioconductor 2.0 # # However, this command itself might not work directly without setting up the correct R version first. # # For demonstration, we assume gcrma and its dependencies (like affy) are available. # Create an R script to preprocess CEL files cat << 'EOF' > preprocess_cel_files.R # Load necessary libraries # gcrma depends on affy and Biobase library(affy) library(gcrma) # Define the directory containing .CEL files # The description mentions "within each species", implying this script # would be run separately for each species' set of CEL files. input_celfiles_dir <- "." # Current directory, adjust as needed # List all .CEL files in the specified directory cel_files <- list.files(path = input_celfiles_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE) if (length(cel_files) == 0) { stop(paste("No .CEL files found in the directory:", input_celfiles_dir)) } message(paste("Found", length(cel_files), ".CEL files for processing.")) # Read CEL files into an AffyBatch object # This step requires the 'affy' package. raw_data <- ReadAffy(filenames = cel_files) # Pre-process using justGCRMA with default settings # The output is an ExpressionSet object, which contains the normalized expression matrix. # The 'justGCRMA' function is from the 'gcrma' package. message("Starting justGCRMA pre-processing with default settings...") eset <- justGCRMA(raw_data) message("justGCRMA pre-processing complete.") # Extract the normalized expression matrix from the ExpressionSet object normalized_expression_matrix <- exprs(eset) # Define the output file name output_file <- "normalized_expression_matrix.tsv" # Save the normalized expression matrix to a tab-separated file # Row names (probe IDs) and column names (sample IDs) are included. write.table(normalized_expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("Normalized expression matrix saved to:", output_file)) EOF # Execute the R script using Rscript Rscript preprocess_cel_files.R -
2
This function background corrects perfect-match probe intensities using probe sequence information, log2-transforms the data, quantile normalizes across the arrays, and summarizes probe intensities via the robust multiarray average (RMA) method (Irizarry et al., 2003) to give an intensity value (log2 scale) for each probe set.
R (affy/oligo package) (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash)$ Bash example
# This script demonstrates how to perform RMA normalization and summarization # using the 'affy' R package, which implements the described steps. # Replace 'path/to/cel_files' with the actual directory containing your .CEL files. # Replace 'output_rma_expression.tsv' with your desired output file name. # Install Bioconductor and 'affy' package if not already installed # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("affy")' # Create an R script to perform RMA cat << 'EOF' > run_rma.R library(affy) # Define the path to your CEL files cel_files_path <- Sys.getenv("CEL_FILES_PATH", ".") # Default to current directory # Read CEL files # This assumes all .CEL files in the specified directory are part of the experiment. # You might need to filter them if there are other files. raw_data <- ReadAffy(celfile.path = cel_files_path) # Perform RMA normalization and summarization # This function automatically handles background correction, log2-transformation, # quantile normalization, and summarization as described by Irizarry et al. (2003). eset <- rma(raw_data) # Extract expression matrix expr_matrix <- exprs(eset) # Define the output file name output_file <- Sys.getenv("OUTPUT_FILE", "rma_expression.tsv") # Write results to a tab-separated file write.table(expr_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("RMA processed data saved to:", output_file)) EOF # Set environment variables for the R script export CEL_FILES_PATH="/path/to/your/cel_files" # IMPORTANT: Replace with your actual CEL file directory export OUTPUT_FILE="rma_expression.tsv" # Execute the R script Rscript run_rma.R # Clean up the R script rm run_rma.R
Tools Used
Raw Source Text
All .CEL files (within each species) were pre-processed using the default settings of the justGCRMA function of gcrma package version 2.8.0 (Wu et al., 2004) as implemented in R. This function background corrects perfect-match probe intensities using probe sequence information, log2-transforms the data, quantile normalizes across the arrays, and summarizes probe intensities via the robust multiarray average (RMA) method (Irizarry et al., 2003) to give an intensity value (log2 scale) for each probe set.