GSE87211 Processing Pipeline

GSE code_examples 1 step

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

GSE87211

Colorectal cancer susceptibility loci as predictive markers of rectal cancer prognosis after surgery

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Raw data were log2 transformed and normalized to 75 percentile according to Agilent protocol.

    $ Bash example
    # Install R and Bioconductor if not already installed
    # sudo apt-get update
    # sudo apt-get install r-base
    # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install('limma')"
    
    # Create a dummy input file for demonstration. 
    # In a real scenario, this would be raw intensity data from multiple Agilent arrays,
    # typically background-subtracted data from Agilent Feature Extraction software.
    # This example simulates a matrix of raw intensities for 3 samples and 100 probes.
    echo "Probe\tSample1\tSample2\tSample3" > input_raw_agilent_intensities.txt
    for i in $(seq 1 100); do
        s1=$(awk -v min=500 -v max=2000 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
        s2=$(awk -v min=600 -v max=2200 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
        s3=$(awk -v min=400 -v max=1800 'BEGIN{srand(); print int(min+rand()*(max-min+1))}')
        echo "Probe$i\t$s1\t$s2\t$s3" >> input_raw_agilent_intensities.txt
    done
    
    # R script for log2 transformation and 75th percentile normalization
    Rscript -e '
      library(limma) # limma is a robust package for microarray analysis
    
      # Define input and output files
      input_file <- "input_raw_agilent_intensities.txt"
      output_file <- "normalized_agilent_data.txt"
    
      # Read raw intensity data
      raw_data_df <- read.table(input_file, header = TRUE, sep = "\t", row.names = 1)
      raw_intensities_matrix <- as.matrix(raw_data_df)
    
      # Step 1: Log2 transformation
      log2_transformed_data <- log2(raw_intensities_matrix)
    
      # Step 2: 75th percentile normalization
      # This method scales each sample (column) so its 75th percentile intensity matches a common target.
      # Calculate the 75th percentile for each sample
      percentile_75_per_sample <- apply(log2_transformed_data, 2, function(x) quantile(x, 0.75, na.rm = TRUE))
    
      # Calculate the mean 75th percentile across all samples to use as a common target
      target_percentile_75 <- mean(percentile_75_per_sample)
    
      # Apply scaling: divide each sample by its 75th percentile and multiply by the target 75th percentile
      normalized_data <- sweep(log2_transformed_data, 2, percentile_75_per_sample, FUN = "/") * target_percentile_75
    
      # Write the normalized data to an output file
      write.table(normalized_data, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
      cat(paste0("Log2 transformed and 75th percentile normalized data saved to ", output_file, "\n"))
    '
    

Tools Used

Raw Source Text
Raw data were log2 transformed and normalized to 75 percentile according to Agilent protocol.
← Back to Analysis