GSE16969 Processing Pipeline
GSE
code_examples
2 steps
Publication
Genomic analysis of the molecular neuropathology of tuberous sclerosis using a human stem cell model.Genome medicine (2016) — PMID 27655340
Dataset
GSE16969Gene expression analysis of TSC-tubers reveals increased expression of adhesion and inflammatory factors
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
The data were analyzed with MicroArray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and Robust Multi-Array Average (RMA) analysis as normalization method.
Microarray v5.0$ Bash example
# MAS 5.0 (MicroArray Suite) is a proprietary, GUI-based software by Affymetrix. # The following R script demonstrates how to perform Robust Multi-Array Average (RMA) # normalization, which is a standard method for Affymetrix data analysis, often # performed using Bioconductor packages on raw .CEL files. # Install R and Bioconductor if not already present # sudo apt-get update && sudo apt-get install -y r-base # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("affy")' # Create an R script to perform RMA cat << 'EOF' > run_rma.R library(affy) # Set working directory to where CEL files are located or specify path # Replace "/path/to/your/cel_files" with the actual directory containing your .CEL files cel_files_dir <- Sys.getenv("CEL_FILES_DIR", ".") # Default to current directory cel_files <- list.files(path = cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE) if (length(cel_files) == 0) { stop("No .CEL files found in the specified directory: ", cel_files_dir) } # Read the AffyBatch object from CEL files raw_data <- ReadAffy(filenames = cel_files) # Perform RMA normalization and summarization # This includes background correction, normalization, and summarization steps rma_data <- rma(raw_data) # Extract expression matrix expression_matrix <- exprs(rma_data) # Write normalized expression matrix to a file output_file <- Sys.getenv("OUTPUT_FILE", "rma_normalized_expression.tsv") write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("RMA normalized expression matrix written to:", output_file)) EOF # Execute the R script # Set the environment variable for the directory containing your .CEL files # Example: export CEL_FILES_DIR="./data/affymetrix_cels" export CEL_FILES_DIR="/path/to/your/cel_files" export OUTPUT_FILE="rma_normalized_expression.tsv" Rscript run_rma.R -
2
The trimmed mean target intensity of each array was set arbitrarily to 100.
$ Bash example
# Install R and necessary packages if not already installed. # For example, using conda: # conda install -c r r-base # Create an R script to perform the trimmed mean scaling normalization. # This script assumes input data is a tab-separated file with identifiers # (e.g., Probe IDs) in the first column and array intensities in subsequent columns. # Replace 'input_intensities.tsv' and 'output_normalized_intensities.tsv' with actual file names. cat << 'EOF' > normalize_trimmed_mean.R # R script for trimmed mean target intensity normalization # This script scales the intensity of each array such that its trimmed mean # equals a specified target intensity. # Function to perform trimmed mean scaling normalization # Args: # input_file: Path to the input tab-separated file containing intensities. # Assumes first column is identifiers (e.g., ProbeID) and # subsequent columns are array intensities. # output_file: Path to the output tab-separated file for normalized intensities. # target_intensity: The desired trimmed mean intensity for each array (default: 100). # trim_fraction: The fraction (0 to 0.5) of observations to be trimmed from each end # when calculating the mean (default: 0.02, i.e., 2% from each end). normalize_trimmed_mean_scaling <- function(input_file, output_file, target_intensity = 100, trim_fraction = 0.02) { # Check if input file exists if (!file.exists(input_file)) { stop(paste("Error: Input file not found at", input_file)) } # Read the input intensity data # Assuming the first column is probe IDs/identifiers and subsequent columns are array intensities # check.names=FALSE to prevent R from modifying column names (e.g., adding X to numeric names) data_raw <- read.delim(input_file, header = TRUE, row.names = 1, sep = "\t", check.names = FALSE) # Ensure data is numeric for calculations data_numeric <- as.matrix(data_raw) if (!is.numeric(data_numeric)) { stop("Error: Intensity columns must contain numeric values.") } # Calculate trimmed mean for each array (column) # na.rm = TRUE to handle potential missing values trimmed_means <- apply(data_numeric, 2, function(x) mean(x, trim = trim_fraction, na.rm = TRUE)) # Check for any NaN or Inf in trimmed means, which could indicate issues (e.g., all NAs in a column) if (any(is.nan(trimmed_means)) || any(is.infinite(trimmed_means))) { stop("Error: Trimmed mean calculation resulted in NaN or Inf for some arrays. Check input data.") } # Calculate scaling factors # Avoid division by zero if a trimmed mean is 0 scaling_factors <- ifelse(trimmed_means == 0, 0, target_intensity / trimmed_means) # Apply scaling to each array # sweep applies a function (here, multiplication) to the rows or columns of a matrix # using a vector of values (scaling_factors) normalized_data <- sweep(data_numeric, 2, scaling_factors, "*") # Combine with original row names (probe IDs/identifiers) normalized_df <- as.data.frame(normalized_data) normalized_df <- cbind(Identifier = rownames(data_raw), normalized_df) # Write the normalized data to an output file write.table(normalized_df, output_file, sep = "\t", row.names = FALSE, quote = FALSE) message(paste("Normalization complete. Output written to:", output_file)) message("\nTrimmed means before scaling:") print(trimmed_means) message("\nScaling factors applied:") print(scaling_factors) } # --- Script execution --- # Define input and output files (PLACEHOLDERS - REPLACE WITH ACTUAL PATHS) # Example: input_intensities.tsv should contain columns like: # Identifier Array1_Intensity Array2_Intensity # ProbeA 1200 1500 # ProbeB 800 950 input_file_path <- "input_intensities.tsv" output_file_path <- "output_normalized_intensities.tsv" # Define parameters based on the description target_intensity_val <- 100 trim_fraction_val <- 0.02 # This is an inferred parameter (2% from each end, total 4% trimmed) # Run the normalization function normalize_trimmed_mean_scaling( input_file = input_file_path, output_file = output_file_path, target_intensity = target_intensity_val, trim_fraction = trim_fraction_val ) EOF # Execute the R script # Ensure 'input_intensities.tsv' exists in the current directory or provide full path # and that R is installed and in your PATH. Rscript normalize_trimmed_mean.R
Tools Used
Raw Source Text
The data were analyzed with MicroArray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and Robust Multi-Array Average (RMA) analysis as normalization method. The trimmed mean target intensity of each array was set arbitrarily to 100.