GSE75214 Processing Pipeline
GSE
code_examples
1 step
Publication
RNA binding protein DDX5 directs tuft cell specification and function to regulate microbial repertoire and disease susceptibility in the intestine.Gut (2022) — PMID 34853057
Dataset
GSE75214Mucosal gene expression profiling in patients with inflammatory bowel disease study
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Probe level analysis was performed on the Affymetrix raw data (.cel files) with the robust multichip average (RMA) method implemented in the Bioconductor package 'aroma.affymetrix' to obtain a log2 expression value for each gene probe set
R vR 4.3.x (Bioconductor 3.18)$ Bash example
#!/bin/bash # Define environment variables for input/output # Replace with your actual CEL file directory and desired output file export CEL_FILES_DIR="data/cel_files" export OUTPUT_FILE="results/rma_log2_expression.tsv" # Create output directory if it doesn't exist mkdir -p $(dirname "$OUTPUT_FILE") # --- R Installation (commented out) --- # # Install R if not available (example for Ubuntu/Debian) # # sudo apt update # # sudo apt install r-base # # Install Bioconductor and aroma.affymetrix package # # Start R and run: # # if (!requireNamespace("BiocManager", quietly = TRUE)) # # install.packages("BiocManager") # # BiocManager::install("aroma.affymetrix") # aroma.affymetrix is part of aroma.core # --- R Script for RMA Analysis --- # Create a temporary R script cat << 'EOF' > rma_analysis.R # R script (rma_analysis.R) library(aroma.affymetrix) # --- Configuration --- cel_files_dir <- Sys.getenv("CEL_FILES_DIR", "path/to/your/cel_files") # Directory containing .cel files output_file <- Sys.getenv("OUTPUT_FILE", "rma_log2_expression.tsv") # Output file name # Check if the directory exists if (!dir.exists(cel_files_dir)) { stop(paste("CEL files directory not found:", cel_files_dir)) } # --- Load Affymetrix CEL files --- # Create an AffymetrixCelSet object from the specified directory # This will automatically detect the chip type from the .cel files message(paste("Loading CEL files from:", cel_files_dir)) cs <- AffymetrixCelSet$byPath(cel_files_dir) message(paste("Detected chip type:", getChipType(cs))) message(paste("Number of arrays:", length(cs))) # --- Perform Robust Multichip Average (RMA) --- # This function performs background correction, quantile normalization, # and median polish summarization to obtain log2 expression values. message("Performing RMA...") ds <- doRMA(cs, verbose=TRUE) # --- Extract log2 expression matrix --- # The expression values are already in log2 scale after RMA message("Extracting log2 expression matrix...") expr_matrix <- extractMatrix(ds) # --- Write results to file --- message(paste("Writing log2 expression matrix to:", output_file)) write.table(expr_matrix, file=output_file, sep="\t", quote=FALSE, row.names=TRUE) message("RMA analysis complete.") EOF # Execute the R script Rscript rma_analysis.R # Clean up the temporary R script rm rma_analysis.R
Tools Used
Raw Source Text
Probe level analysis was performed on the Affymetrix raw data (.cel files) with the robust multichip average (RMA) method implemented in the Bioconductor package 'aroma.affymetrix' to obtain a log2 expression value for each gene probe set