GSE16969 Processing Pipeline

GSE code_examples 2 steps

Publication

Genomic analysis of the molecular neuropathology of tuberous sclerosis using a human stem cell model.

Genome medicine (2016) — PMID 27655340

Dataset

GSE16969

Gene expression analysis of TSC-tubers reveals increased expression of adhesion and inflammatory factors

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    The data were analyzed with MicroArray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and Robust Multi-Array Average (RMA) analysis as normalization method.

    $ Bash example
    # MAS 5.0 (MicroArray Suite) is a proprietary, GUI-based software by Affymetrix.
    # The following R script demonstrates how to perform Robust Multi-Array Average (RMA)
    # normalization, which is a standard method for Affymetrix data analysis, often
    # performed using Bioconductor packages on raw .CEL files.
    
    # Install R and Bioconductor if not already present
    # sudo apt-get update && sudo apt-get install -y r-base
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("affy")'
    
    # Create an R script to perform RMA
    cat << 'EOF' > run_rma.R
    library(affy)
    
    # Set working directory to where CEL files are located or specify path
    # Replace "/path/to/your/cel_files" with the actual directory containing your .CEL files
    cel_files_dir <- Sys.getenv("CEL_FILES_DIR", ".") # Default to current directory
    cel_files <- list.files(path = cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    
    if (length(cel_files) == 0) {
      stop("No .CEL files found in the specified directory: ", cel_files_dir)
    }
    
    # Read the AffyBatch object from CEL files
    raw_data <- ReadAffy(filenames = cel_files)
    
    # Perform RMA normalization and summarization
    # This includes background correction, normalization, and summarization steps
    rma_data <- rma(raw_data)
    
    # Extract expression matrix
    expression_matrix <- exprs(rma_data)
    
    # Write normalized expression matrix to a file
    output_file <- Sys.getenv("OUTPUT_FILE", "rma_normalized_expression.tsv")
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("RMA normalized expression matrix written to:", output_file))
    EOF
    
    # Execute the R script
    # Set the environment variable for the directory containing your .CEL files
    # Example: export CEL_FILES_DIR="./data/affymetrix_cels"
    export CEL_FILES_DIR="/path/to/your/cel_files"
    export OUTPUT_FILE="rma_normalized_expression.tsv"
    Rscript run_rma.R
  2. 2

    The trimmed mean target intensity of each array was set arbitrarily to 100.

    R (Inferred with models/gemini-2.5-flash) v4.x GitHub
    $ Bash example
    # Install R and necessary packages if not already installed.
    # For example, using conda:
    # conda install -c r r-base
    
    # Create an R script to perform the trimmed mean scaling normalization.
    # This script assumes input data is a tab-separated file with identifiers
    # (e.g., Probe IDs) in the first column and array intensities in subsequent columns.
    # Replace 'input_intensities.tsv' and 'output_normalized_intensities.tsv' with actual file names.
    
    cat << 'EOF' > normalize_trimmed_mean.R
    # R script for trimmed mean target intensity normalization
    # This script scales the intensity of each array such that its trimmed mean
    # equals a specified target intensity.
    
    # Function to perform trimmed mean scaling normalization
    # Args:
    #   input_file: Path to the input tab-separated file containing intensities.
    #               Assumes first column is identifiers (e.g., ProbeID) and
    #               subsequent columns are array intensities.
    #   output_file: Path to the output tab-separated file for normalized intensities.
    #   target_intensity: The desired trimmed mean intensity for each array (default: 100).
    #   trim_fraction: The fraction (0 to 0.5) of observations to be trimmed from each end
    #                  when calculating the mean (default: 0.02, i.e., 2% from each end).
    normalize_trimmed_mean_scaling <- function(input_file, output_file, target_intensity = 100, trim_fraction = 0.02) {
      # Check if input file exists
      if (!file.exists(input_file)) {
        stop(paste("Error: Input file not found at", input_file))
      }
    
      # Read the input intensity data
      # Assuming the first column is probe IDs/identifiers and subsequent columns are array intensities
      # check.names=FALSE to prevent R from modifying column names (e.g., adding X to numeric names)
      data_raw <- read.delim(input_file, header = TRUE, row.names = 1, sep = "\t", check.names = FALSE)
    
      # Ensure data is numeric for calculations
      data_numeric <- as.matrix(data_raw)
      if (!is.numeric(data_numeric)) {
        stop("Error: Intensity columns must contain numeric values.")
      }
    
      # Calculate trimmed mean for each array (column)
      # na.rm = TRUE to handle potential missing values
      trimmed_means <- apply(data_numeric, 2, function(x) mean(x, trim = trim_fraction, na.rm = TRUE))
    
      # Check for any NaN or Inf in trimmed means, which could indicate issues (e.g., all NAs in a column)
      if (any(is.nan(trimmed_means)) || any(is.infinite(trimmed_means))) {
        stop("Error: Trimmed mean calculation resulted in NaN or Inf for some arrays. Check input data.")
      }
    
      # Calculate scaling factors
      # Avoid division by zero if a trimmed mean is 0
      scaling_factors <- ifelse(trimmed_means == 0, 0, target_intensity / trimmed_means)
    
      # Apply scaling to each array
      # sweep applies a function (here, multiplication) to the rows or columns of a matrix
      # using a vector of values (scaling_factors)
      normalized_data <- sweep(data_numeric, 2, scaling_factors, "*")
    
      # Combine with original row names (probe IDs/identifiers)
      normalized_df <- as.data.frame(normalized_data)
      normalized_df <- cbind(Identifier = rownames(data_raw), normalized_df)
    
      # Write the normalized data to an output file
      write.table(normalized_df, output_file, sep = "\t", row.names = FALSE, quote = FALSE)
    
      message(paste("Normalization complete. Output written to:", output_file))
      message("\nTrimmed means before scaling:")
      print(trimmed_means)
      message("\nScaling factors applied:")
      print(scaling_factors)
    }
    
    # --- Script execution ---
    # Define input and output files (PLACEHOLDERS - REPLACE WITH ACTUAL PATHS)
    # Example: input_intensities.tsv should contain columns like:
    # Identifier    Array1_Intensity    Array2_Intensity
    # ProbeA        1200                1500
    # ProbeB        800                 950
    input_file_path <- "input_intensities.tsv"
    output_file_path <- "output_normalized_intensities.tsv"
    
    # Define parameters based on the description
    target_intensity_val <- 100
    trim_fraction_val <- 0.02 # This is an inferred parameter (2% from each end, total 4% trimmed)
    
    # Run the normalization function
    normalize_trimmed_mean_scaling(
      input_file = input_file_path,
      output_file = output_file_path,
      target_intensity = target_intensity_val,
      trim_fraction = trim_fraction_val
    )
    EOF
    
    # Execute the R script
    # Ensure 'input_intensities.tsv' exists in the current directory or provide full path
    # and that R is installed and in your PATH.
    Rscript normalize_trimmed_mean.R

Tools Used

Raw Source Text
The data were analyzed with MicroArray Suite version 5.0 (MAS 5.0) using Affymetrix default analysis settings and Robust Multi-Array Average (RMA) analysis as normalization method. The trimmed mean target intensity of each array was set arbitrarily to 100.
← Back to Analysis