GSE75214 Processing Pipeline

GSE code_examples 1 step

Publication

RNA binding protein DDX5 directs tuft cell specification and function to regulate microbial repertoire and disease susceptibility in the intestine.

Gut (2022) — PMID 34853057

Dataset

GSE75214

Mucosal gene expression profiling in patients with inflammatory bowel disease study

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Probe level analysis was performed on the Affymetrix raw data (.cel files) with the robust multichip average (RMA) method implemented in the Bioconductor package 'aroma.affymetrix' to obtain a log2 expression value for each gene probe set

    R vR 4.3.x (Bioconductor 3.18)
    $ Bash example
    #!/bin/bash
    
    # Define environment variables for input/output
    # Replace with your actual CEL file directory and desired output file
    export CEL_FILES_DIR="data/cel_files"
    export OUTPUT_FILE="results/rma_log2_expression.tsv"
    
    # Create output directory if it doesn't exist
    mkdir -p $(dirname "$OUTPUT_FILE")
    
    # --- R Installation (commented out) ---
    # # Install R if not available (example for Ubuntu/Debian)
    # # sudo apt update
    # # sudo apt install r-base
    
    # # Install Bioconductor and aroma.affymetrix package
    # # Start R and run:
    # # if (!requireNamespace("BiocManager", quietly = TRUE))
    # #     install.packages("BiocManager")
    # # BiocManager::install("aroma.affymetrix") # aroma.affymetrix is part of aroma.core
    
    # --- R Script for RMA Analysis ---
    # Create a temporary R script
    cat << 'EOF' > rma_analysis.R
    # R script (rma_analysis.R)
    
    library(aroma.affymetrix)
    
    # --- Configuration ---
    cel_files_dir <- Sys.getenv("CEL_FILES_DIR", "path/to/your/cel_files") # Directory containing .cel files
    output_file <- Sys.getenv("OUTPUT_FILE", "rma_log2_expression.tsv") # Output file name
    
    # Check if the directory exists
    if (!dir.exists(cel_files_dir)) {
        stop(paste("CEL files directory not found:", cel_files_dir))
    }
    
    # --- Load Affymetrix CEL files ---
    # Create an AffymetrixCelSet object from the specified directory
    # This will automatically detect the chip type from the .cel files
    message(paste("Loading CEL files from:", cel_files_dir))
    cs <- AffymetrixCelSet$byPath(cel_files_dir)
    message(paste("Detected chip type:", getChipType(cs)))
    message(paste("Number of arrays:", length(cs)))
    
    # --- Perform Robust Multichip Average (RMA) ---
    # This function performs background correction, quantile normalization,
    # and median polish summarization to obtain log2 expression values.
    message("Performing RMA...")
    ds <- doRMA(cs, verbose=TRUE)
    
    # --- Extract log2 expression matrix ---
    # The expression values are already in log2 scale after RMA
    message("Extracting log2 expression matrix...")
    expr_matrix <- extractMatrix(ds)
    
    # --- Write results to file ---
    message(paste("Writing log2 expression matrix to:", output_file))
    write.table(expr_matrix, file=output_file, sep="\t", quote=FALSE, row.names=TRUE)
    
    message("RMA analysis complete.")
    EOF
    
    # Execute the R script
    Rscript rma_analysis.R
    
    # Clean up the temporary R script
    rm rma_analysis.R

Tools Used

Raw Source Text
Probe level analysis was performed on the Affymetrix raw data (.cel files) with the robust multichip average (RMA) method implemented in the Bioconductor package 'aroma.affymetrix' to obtain a log2 expression value for each gene probe set
← Back to Analysis