GSE37892 Processing Pipeline

GSE code_examples 3 steps

Publication

DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.

Life science alliance (2020) — PMID 32817263

Dataset

GSE37892

A seven-gene signature aggregates a subgroup of stage II colon cancers with stage III.

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    CEL files were processed in the R (v.

    R vUnknown GitHub
    $ Bash example
    # Install R (if not already installed)
    # conda install -c conda-forge r-base
    
    # Install Bioconductor packages for CEL file processing (e.g., 'affy' for RMA normalization)
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
    # R -e 'BiocManager::install("affy")'
    
    # Create a placeholder R script for processing CEL files
    cat << 'EOF' > process_cel_files.R
    # Load necessary libraries
    library(affy)
    
    # Define the directory containing CEL files
    # Replace "." with the actual path to your CEL files if they are not in the current directory
    cel_dir <- "."
    
    # List all CEL files in the specified directory
    cel_files <- list.celfiles(path = cel_dir, full.names = TRUE)
    
    # Check if any CEL files were found
    if (length(cel_files) == 0) {
      stop("No CEL files found in the specified directory: ", cel_dir)
    }
    
    message(paste("Found", length(cel_files), "CEL files."))
    
    # Read CEL files into an AffyBatch object
    # This step can be memory intensive depending on the number and size of CEL files
    raw_data <- ReadAffy(filenames = cel_files)
    
    # Perform Robust Multi-array Average (RMA) normalization
    # RMA is a common method for background correction, normalization, and summarization of Affymetrix data
    normalized_data <- rma(raw_data)
    
    # Extract the expression matrix (log2 transformed and normalized intensities)
    expression_matrix <- exprs(normalized_data)
    
    # Save the processed expression matrix to a CSV file
    output_csv_file <- "processed_cel_expression.csv"
    write.csv(expression_matrix, file = output_csv_file, row.names = TRUE)
    message(paste("Processed expression matrix saved to:", output_csv_file))
    
    # Optionally, save the entire ExpressionSet object for further analysis in R
    output_rdata_file <- "processed_cel_eset.RData"
    save(normalized_data, file = output_rdata_file)
    message(paste("Normalized ExpressionSet object saved to:", output_rdata_file))
    EOF
    
    # Execute the R script to process CEL files
    # Ensure that your CEL files are in the directory specified by 'cel_dir' in the R script
    Rscript process_cel_files.R
  2. 2

    2.10.0)/Bioconductor (v 2.5) environment.

    $ Bash example
    # This step describes the R/Bioconductor environment used, not a specific execution command.
    # The description indicates that the analysis was performed within an R (v 2.10.0, though the prompt specifies 2.5) and Bioconductor (v 2.5) environment.
    # No specific R script or command is provided in the description.
    
    # To use R version 2.5 with Bioconductor 2.5, you would typically need to have it installed.
    # Installation of such old R/Bioconductor versions can be complex and might require specific system configurations or virtual environments.
    # For modern systems, using tools like `conda` or `renv` for environment management is recommended, but finding R 2.5 and Bioconductor 2.5 via conda might be challenging due to their age.
    
    # Example of how one might launch R, assuming it's in the PATH and the correct version is active:
    # R --version # To check the R version
    # Rscript -e "packageVersion('Biobase')" # To check a core Bioconductor package version, indicating Bioconductor environment
    
    # If a specific R script were provided, the command would typically look like:
    # Rscript your_analysis_script.R arg1 arg2
    # Or for interactive use:
    # R
  3. 3

    Pre-processing steps (background adjustment, normalization and summarization) were performed with the GCRMA package (v.2.18.1)

    GCRMA v2.18.1 GitHub
    $ Bash example
    # Install R and Bioconductor packages if not already installed (uncomment and run if needed)
    # R -e "install.packages('BiocManager')"
    # R -e "BiocManager::install('gcrma')"
    # R -e "BiocManager::install('affy')"
    # R -e "BiocManager::install('hgu133plus2.db')" # Example: Replace with the appropriate chip annotation package for your data (e.g., hgu133plus2.db, hgu95av2.db, etc.)
    
    # Create a dummy R script to perform GCRMA pre-processing
    cat << 'EOF' > run_gcrma_preprocessing.R
    # Load necessary libraries
    library(affy)
    library(gcrma)
    
    # --- Configuration --- #
    # Define the directory containing your raw Affymetrix .CEL files
    cel_files_directory <- "./path/to/your/cel_files"
    
    # Define the output file name for the normalized expression matrix
    output_expression_file <- "gcrma_normalized_expression.txt"
    # --- End Configuration --- #
    
    # Check if the CEL files directory exists
    if (!dir.exists(cel_files_directory)) {
        stop(paste("Error: CEL files directory not found at", cel_files_directory))
    }
    
    # List all .CEL files in the specified directory
    cel_files <- list.files(path = cel_files_directory, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    
    if (length(cel_files) == 0) {
        stop(paste("No .CEL files found in", cel_files_directory, ". Please ensure files are present and have a .CEL extension."))
    }
    
    message(paste("Found", length(cel_files), ".CEL files."))
    
    # Read CEL files into an AffyBatch object
    # This step requires that all CEL files are from the same chip type
    # and that the corresponding chip annotation package is installed.
    raw_data <- ReadAffy(filenames = cel_files)
    
    message("Performing GCRMA pre-processing (background adjustment, normalization, summarization)...")
    
    # Perform GCRMA pre-processing
    # The gcrma function performs background adjustment, normalization, and summarization
    # by default, as described in the pipeline step.
    eset <- gcrma(raw_data)
    
    # Extract the normalized expression matrix
    expression_matrix <- exprs(eset)
    
    # Write the normalized expression matrix to a tab-separated file
    write.table(expression_matrix, file = output_expression_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("GCRMA pre-processing complete. Normalized expression matrix saved to:", output_expression_file))
    EOF
    
    # Execute the R script
    Rscript run_gcrma_preprocessing.R

Tools Used

Raw Source Text
CEL files were processed in the R (v. 2.10.0)/Bioconductor (v 2.5) environment. Pre-processing steps (background adjustment, normalization and summarization) were performed with the GCRMA package (v.2.18.1)
← Back to Analysis