GSE31595 Processing Pipeline

GSE code_examples 2 steps

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

Gene Expression Profiles in Stage II and III Colon Cancer. Application of a 128-gene signature

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

The processing and analysis of expression data were performed using the statistical software R and Bioconductor.

R vNot specified GitHub

$ Bash example

# Install R and Bioconductor (example using Conda)
# conda create -n r_env r-base bioconductor-biocmanager
# conda activate r_env
# R -e "BiocManager::install(c('limma', 'DESeq2', 'edgeR'))" # Example Bioconductor packages for expression analysis

# Execute an R script for expression data processing and analysis
# Replace 'expression_data.csv' with your actual input data file (e.g., counts, normalized expression)
# Replace 'analysis_script.R' with the actual R script performing the analysis
# Replace 'output_results.tsv' with the actual output file (e.g., differential expression results, processed data)
Rscript analysis_script.R expression_data.csv output_results.tsv

View on GitHub

For all Affymetrix CEL files the background were corrected and the expression were normalized using robust multiarray average (RMA).

Microarray v1.78.0 GitHub

$ Bash example

# Install R if not already installed
# sudo apt-get update
# sudo apt-get install r-base

# Install Bioconductor and affy package
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install("affy")'

# Create a dummy directory for CEL files (replace with your actual CEL file directory)
mkdir -p cel_files

# Create dummy CEL files for demonstration purposes.
# In a real scenario, these would be your actual Affymetrix .CEL files.
touch cel_files/sample1.CEL
touch cel_files/sample2.CEL
touch cel_files/sample3.CEL

# Create an R script to perform RMA normalization
cat << 'EOF' > run_rma.R
library(affy)

# Define the directory containing CEL files
cel_files_dir <- "cel_files"

# List all CEL files in the specified directory
cel_files <- list.files(path = cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)

if (length(cel_files) == 0) {
  stop(paste("Error: No CEL files found in", cel_files_dir, ". Please ensure your CEL files are in this directory."))
}

message(paste("Found", length(cel_files), "CEL files."))

# Read CEL files into an AffyBatch object
# For basic RMA, a simple ReadAffy call is sufficient.
# For more complex experiments, a phenoData file might be needed.
affy_batch <- ReadAffy(filenames = cel_files)

message("Performing RMA normalization...")
# Perform Robust Multiarray Average (RMA) normalization
# This function performs background correction, normalization, and summarization.
eset <- rma(affy_batch)

# Extract the normalized expression matrix
expression_matrix <- exprs(eset)

# Define the output file name
output_file <- "rma_normalized_expression.tsv"

# Write the expression matrix to a tab-separated file
# row.names = TRUE to keep probe IDs as the first column
write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

message(paste("RMA normalized expression matrix successfully written to", output_file))
EOF

# Execute the R script
Rscript run_rma.R

View on GitHub

Tools Used

R Microarray

Raw Source Text

The processing and analysis of expression data were performed using the statistical software R and Bioconductor. For all Affymetrix CEL files the background were corrected and the expression were normalized using robust multiarray average (RMA).

← Back to Analysis