GSE87211 Processing Pipeline
GSE
code_examples
1 step
Publication
DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.Life science alliance (2020) — PMID 32817263
Dataset
GSE87211Colorectal cancer susceptibility loci as predictive markers of rectal cancer prognosis after surgery
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Raw data were log2 transformed and normalized to 75 percentile according to Agilent protocol.
$ Bash example
# Install R and Bioconductor if not already installed # sudo apt-get update # sudo apt-get install r-base # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install('limma')" # Create a dummy input file for demonstration. # In a real scenario, this would be raw intensity data from multiple Agilent arrays, # typically background-subtracted data from Agilent Feature Extraction software. # This example simulates a matrix of raw intensities for 3 samples and 100 probes. echo "Probe\tSample1\tSample2\tSample3" > input_raw_agilent_intensities.txt for i in $(seq 1 100); do s1=$(awk -v min=500 -v max=2000 'BEGIN{srand(); print int(min+rand()*(max-min+1))}') s2=$(awk -v min=600 -v max=2200 'BEGIN{srand(); print int(min+rand()*(max-min+1))}') s3=$(awk -v min=400 -v max=1800 'BEGIN{srand(); print int(min+rand()*(max-min+1))}') echo "Probe$i\t$s1\t$s2\t$s3" >> input_raw_agilent_intensities.txt done # R script for log2 transformation and 75th percentile normalization Rscript -e ' library(limma) # limma is a robust package for microarray analysis # Define input and output files input_file <- "input_raw_agilent_intensities.txt" output_file <- "normalized_agilent_data.txt" # Read raw intensity data raw_data_df <- read.table(input_file, header = TRUE, sep = "\t", row.names = 1) raw_intensities_matrix <- as.matrix(raw_data_df) # Step 1: Log2 transformation log2_transformed_data <- log2(raw_intensities_matrix) # Step 2: 75th percentile normalization # This method scales each sample (column) so its 75th percentile intensity matches a common target. # Calculate the 75th percentile for each sample percentile_75_per_sample <- apply(log2_transformed_data, 2, function(x) quantile(x, 0.75, na.rm = TRUE)) # Calculate the mean 75th percentile across all samples to use as a common target target_percentile_75 <- mean(percentile_75_per_sample) # Apply scaling: divide each sample by its 75th percentile and multiply by the target 75th percentile normalized_data <- sweep(log2_transformed_data, 2, percentile_75_per_sample, FUN = "/") * target_percentile_75 # Write the normalized data to an output file write.table(normalized_data, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) cat(paste0("Log2 transformed and 75th percentile normalized data saved to ", output_file, "\n")) '
Tools Used
Raw Source Text
Raw data were log2 transformed and normalized to 75 percentile according to Agilent protocol.