GSE31595 Processing Pipeline

GSE code_examples 2 steps

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

GSE31595

Gene Expression Profiles in Stage II and III Colon Cancer. Application of a 128-gene signature

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    The processing and analysis of expression data were performed using the statistical software R and Bioconductor.

    R vNot specified GitHub
    $ Bash example
    # Install R and Bioconductor (example using Conda)
    # conda create -n r_env r-base bioconductor-biocmanager
    # conda activate r_env
    # R -e "BiocManager::install(c('limma', 'DESeq2', 'edgeR'))" # Example Bioconductor packages for expression analysis
    
    # Execute an R script for expression data processing and analysis
    # Replace 'expression_data.csv' with your actual input data file (e.g., counts, normalized expression)
    # Replace 'analysis_script.R' with the actual R script performing the analysis
    # Replace 'output_results.tsv' with the actual output file (e.g., differential expression results, processed data)
    Rscript analysis_script.R expression_data.csv output_results.tsv
  2. 2

    For all Affymetrix CEL files the background were corrected and the expression were normalized using robust multiarray average (RMA).

    $ Bash example
    # Install R if not already installed
    # sudo apt-get update
    # sudo apt-get install r-base
    
    # Install Bioconductor and affy package
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
    # R -e 'BiocManager::install("affy")'
    
    # Create a dummy directory for CEL files (replace with your actual CEL file directory)
    mkdir -p cel_files
    
    # Create dummy CEL files for demonstration purposes.
    # In a real scenario, these would be your actual Affymetrix .CEL files.
    touch cel_files/sample1.CEL
    touch cel_files/sample2.CEL
    touch cel_files/sample3.CEL
    
    # Create an R script to perform RMA normalization
    cat << 'EOF' > run_rma.R
    library(affy)
    
    # Define the directory containing CEL files
    cel_files_dir <- "cel_files"
    
    # List all CEL files in the specified directory
    cel_files <- list.files(path = cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    
    if (length(cel_files) == 0) {
      stop(paste("Error: No CEL files found in", cel_files_dir, ". Please ensure your CEL files are in this directory."))
    }
    
    message(paste("Found", length(cel_files), "CEL files."))
    
    # Read CEL files into an AffyBatch object
    # For basic RMA, a simple ReadAffy call is sufficient.
    # For more complex experiments, a phenoData file might be needed.
    affy_batch <- ReadAffy(filenames = cel_files)
    
    message("Performing RMA normalization...")
    # Perform Robust Multiarray Average (RMA) normalization
    # This function performs background correction, normalization, and summarization.
    eset <- rma(affy_batch)
    
    # Extract the normalized expression matrix
    expression_matrix <- exprs(eset)
    
    # Define the output file name
    output_file <- "rma_normalized_expression.tsv"
    
    # Write the expression matrix to a tab-separated file
    # row.names = TRUE to keep probe IDs as the first column
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("RMA normalized expression matrix successfully written to", output_file))
    EOF
    
    # Execute the R script
    Rscript run_rma.R

Tools Used

Raw Source Text
The processing and analysis of expression data were performed using the statistical software R and Bioconductor. For all Affymetrix CEL files the background were corrected and the expression were normalized using robust multiarray average (RMA).
← Back to Analysis