GSE34992 Processing Pipeline
GSE
code_examples
2 steps
Publication
Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins.Cell reports (2012) — PMID 22574288
Dataset
GSE34992Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins (splicing arrays)
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.
Microarray vInferred with models/gemini-2.5-flash$ Bash example
# Install Affymetrix Power Tools (APT) # conda install -c bioconda affymetrix-power-tools # Define input and output paths # Replace with actual CEL files and the correct CDF file for your array type CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" # Placeholder for actual CEL files CDF_FILE="HG-U133A.cdf" # Placeholder for the specific array's CDF file (e.g., from Affymetrix support site) OUTPUT_DIR="apt_summarize_output" ALGORITHM="rma" # Common summarization algorithm (e.g., rma, mas5, dabg) # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Execute apt-probeset-summarize apt-probeset-summarize \ -a "${ALGORITHM}" \ -o "${OUTPUT_DIR}" \ -c "${CDF_FILE}" \ ${CEL_FILES} -
2
Iter-plier algorithm used to quantify probesets.
plier (R package) (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install R and Bioconductor if not already installed # sudo apt-get update # sudo apt-get install r-base # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install(c('plier', 'affy'))" cat << 'EOF' > run_plier.R # Load necessary R packages library(plier) library(affy) # Required for ReadAffy # Define input CEL files directory and output file # IMPORTANT: Replace "path/to/your/raw_cel_files" with the actual directory containing your .CEL files. cel_files_dir <- Sys.getenv("CEL_FILES_DIR", "path/to/your/raw_cel_files") output_file <- Sys.getenv("OUTPUT_FILE", "probeset_quantification_plier.tsv") # Check if the CEL files directory exists if (!dir.exists(cel_files_dir)) { stop(paste("Error: CEL files directory not found:", cel_files_dir, "\nPlease update the 'CEL_FILES_DIR' environment variable to point to your actual .CEL files.")) } # Read CEL files into an AffyBatch object # This step requires valid Affymetrix .CEL files. # Ensure that the appropriate annotation package for your array type is installed # if you plan to use it for downstream analysis (e.g., hgu133plus2.db). # Example: BiocManager::install("hgu133plus2.db") raw_data <- ReadAffy(celfile.path = cel_files_dir) # Perform PLIER quantification # The 'plier' function implements the Iter-plier algorithm. eset_plier <- plier(raw_data) # Extract expression values (log2 transformed) expression_matrix <- exprs(eset_plier) # Write results to a TSV file write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("PLIER quantification complete. Results saved to:", output_file)) EOF # Set environment variables for the R script # IMPORTANT: Replace "path/to/your/raw_cel_files" with the actual directory containing your .CEL files export CEL_FILES_DIR="path/to/your/raw_cel_files" export OUTPUT_FILE="probeset_quantification_plier.tsv" # Execute the R script Rscript run_plier.R
Tools Used
Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.