GSE14333 Processing Pipeline
GSE
code_examples
2 steps
Publication
DDX5 promotes oncogene C3 and FABP1 expressions and drives intestinal inflammation and tumorigenesis.Life science alliance (2020) — PMID 32817263
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Use the simpleaffy package in R/Bioconductor to calculate MAS5.0 calls.
R v4.3 (Bioconductor 3.18)$ Bash example
# Install R and Bioconductor simpleaffy package if not already installed # For Conda (recommended for environment management): # conda create -n bioconductor_env r-base bioconductor-simpleaffy -y # conda activate bioconductor_env # For R directly: # Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' # Rscript -e 'BiocManager::install("simpleaffy")' # Create an R script to perform MAS5.0 normalization cat << 'EOF' > mas5_normalization.R library(simpleaffy) # Define input directory containing .CEL files and output file # Assuming .CEL files are in the current directory. Adjust 'cel_dir' if needed. cel_dir <- "." output_file <- "mas5_normalized_expression.tsv" # List all .CEL files in the specified directory cel_files <- list.celfiles(cel_dir, full.names=TRUE) # Check if any .CEL files were found if (length(cel_files) == 0) { stop(paste("No .CEL files found in", cel_dir, ". Please ensure .CEL files are present or adjust 'cel_dir'.")) } # Read Affymetrix .CEL files into an AffyBatch object # For more complex experiments (e.g., with sample metadata), consider creating a phenoData file # and using read.affybatch(filenames=cel_files, phenoData=pheno_data_object) raw_data <- read.affybatch(filenames=cel_files) # Perform MAS5.0 normalization # By default, MAS5 normalization in simpleaffy outputs log2 transformed values. mas5_normalized_data <- mas5(raw_data) # Extract expression values expression_matrix <- exprs(mas5_normalized_data) # Write normalized expression matrix to a TSV file write.table(expression_matrix, file=output_file, sep="\t", quote=FALSE, row.names=TRUE) message(paste("MAS5.0 normalized expression values written to:", output_file)) EOF # Execute the R script Rscript mas5_normalization.R -
2
These values were subsequently normalized using quantile normalization.
R (with preprocessCore package) (Inferred with models/gemini-2.5-flash) vR 4.3.0, preprocessCore 1.62.0 GitHub$ Bash example
# Install R and preprocessCore if not already available # conda create -n r_env r-base bioconductor-preprocesscore -y # conda activate r_env # Example: Create a dummy input file (replace with your actual input_matrix.tsv) # echo -e "gene\tsample1\tsample2\tsample3" > input_matrix.tsv # echo -e "geneA\t100\t200\t50" >> input_matrix.tsv # echo -e "geneB\t50\t100\t25" >> input_matrix.tsv # echo -e "geneC\t200\t50\t100" >> input_matrix.tsv # R script for quantile normalization Rscript -e ' library(preprocessCore) # Define input and output file names input_file <- "input_matrix.tsv" output_file <- "normalized_matrix.tsv" # Read the input matrix # Assuming the first column contains row identifiers (e.g., gene names) # and subsequent columns contain numeric data for samples. # header=TRUE assumes the first row contains column names (sample IDs). # sep="\t" assumes tab-separated values. # Adjust these parameters based on the actual input file format. input_data <- read.table(input_file, sep="\t", header=TRUE, row.names=1, check.names=FALSE) # Convert to matrix for preprocessCore data_matrix <- as.matrix(input_data) # Perform quantile normalization normalized_matrix <- normalize.quantiles(data_matrix) # Restore column names (sample IDs) and row names (gene IDs) colnames(normalized_matrix) <- colnames(data_matrix) rownames(normalized_matrix) <- rownames(data_matrix) # Write the normalized matrix to an output file # quote=FALSE prevents R from adding quotes around string values. # col.names=NA is used when row.names=TRUE to leave the top-left cell empty, # which is standard for matrices with row names written to file. write.table(normalized_matrix, output_file, sep="\t", quote=FALSE, row.names=TRUE, col.names=NA) '
Tools Used
Raw Source Text
Use the simpleaffy package in R/Bioconductor to calculate MAS5.0 calls. These values were subsequently normalized using quantile normalization.