GSE87211 Processing Pipeline — Yeo Lab Publications

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

Colorectal cancer susceptibility loci as predictive markers of rectal cancer prognosis after surgery

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Raw data were log2 transformed and normalized to 75 percentile according to Agilent protocol.

Microarray v3.56.2 GitHub

$ Bash example

# Install R and Bioconductor if not already installed
# sudo apt-get update
# sudo apt-get install r-base
# R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install('limma')"

# Create a dummy input file for demonstration. 
# In a real scenario, this would be raw intensity data from multiple Agilent arrays,
# typically background-subtracted data from Agilent Feature Extraction software.
# This example simulates a matrix of raw intensities for 3 samples and 100 probes.
echo "Probe\tSample1\tSample2\tSample3" > input_raw_agilent_intensities.txt
for i in $(seq 1 100); do
    s1=$(awk -v min=500 -v max=2000 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
    s2=$(awk -v min=600 -v max=2200 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
    s3=$(awk -v min=400 -v max=1800 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
    echo "Probe$i\t$s1\t$s2\t$s3" >> input_raw_agilent_intensities.txt
done

# R script for log2 transformation and 75th percentile normalization
Rscript -e '
  library(limma) # limma is a robust package for microarray analysis

  # Define input and output files
  input_file <- "input_raw_agilent_intensities.txt"
  output_file <- "normalized_agilent_data.txt"

  # Read raw intensity data
  raw_data_df <- read.table(input_file, header = TRUE, sep = "\t", row.names = 1)
  raw_intensities_matrix <- as.matrix(raw_data_df)

  # Step 1: Log2 transformation
  log2_transformed_data <- log2(raw_intensities_matrix)

  # Step 2: 75th percentile normalization
  # This method scales each sample (column) so its 75th percentile intensity matches a common target.
  # Calculate the 75th percentile for each sample
  percentile_75_per_sample <- apply(log2_transformed_data, 2, function(x) quantile(x, 0.75, na.rm = TRUE))

  # Calculate the mean 75th percentile across all samples to use as a common target
  target_percentile_75 <- mean(percentile_75_per_sample)

  # Apply scaling: divide each sample by its 75th percentile and multiply by the target 75th percentile
  normalized_data <- sweep(log2_transformed_data, 2, percentile_75_per_sample, FUN = "/") * target_percentile_75

  # Write the normalized data to an output file
  write.table(normalized_data, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

  cat(paste0("Log2 transformed and 75th percentile normalized data saved to ", output_file, "\n"))
'

View on GitHub

Tools Used

Microarray