GSE66696 Processing Pipeline
GSE
code_examples
6 steps
Publication
Nxf1 natural variant E610G is a semi-dominant suppressor of IAP-induced RNA processing defects.PLoS genetics (2015) — PMID 25835743
Dataset
GSE66696Whole brain RNA from congenic littermates does not support a general effect of Nxf1 CAST alleles on alternative pre-mRNA processing
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.
Microarray v2.12.0 (Inferred with models/gemini-2.5-flash)$ Bash example
# Affymetrix Power Tools (APT) is typically downloaded as a binary package from Thermo Fisher Scientific. # Installation instructions vary by OS. For example, on Linux: # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.12.0_Linux_x64.zip # unzip APT_2.12.0_Linux_x64.zip # cd APT_2.12.0_Linux_x64 # export PATH=$PWD/bin:$PATH # Placeholder for input CEL files, output directory, and array-specific reference files. # Replace 'sample*.CEL' with actual input CEL file names. # Replace 'HG-U133_Plus_2.cdf' and 'HG-U133_Plus_2' with the appropriate CDF file and chip type for your array. INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" OUTPUT_DIR="apt_summarize_output" PROBESET_FILE="HG-U133_Plus_2.cdf" # Example: CDF file for a common Affymetrix array CHIP_TYPE="HG-U133_Plus_2" # Example: Chip type corresponding to the array mkdir -p "${OUTPUT_DIR}" apt-probeset-summarize \ -a rma \ -o "${OUTPUT_DIR}" \ -p "${PROBESET_FILE}" \ -c "${CHIP_TYPE}" \ ${INPUT_CEL_FILES} -
2
Iter-plier algorithm used to quantify probesets.
Iter-plier vImplemented in `affy` R package (Bioconductor 3.18) (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install R and Bioconductor if not already present # R -e "install.packages('BiocManager')" # R -e "BiocManager::install('affy')" # Create an R script for Iter-plier quantification cat << 'EOF' > quantify_probesets_iterplier.R # Load the affy package, which implements the Iter-plier algorithm library(affy) # --- Configuration --- # Specify the path to your Affymetrix CEL files. # IMPORTANT: Replace "./" with the actual directory containing your CEL files. CEL_FILES_PATH <- "./" # Example: current directory # --- Data Loading --- # Read Affymetrix CEL files into an AffyBatch object. # This object contains the raw probe intensity data. # The 'affy' package automatically uses the appropriate CDF (Chip Description File) # based on the array type detected in the CEL files. message(paste("Loading CEL files from:", CEL_FILES_PATH)) if (!dir.exists(CEL_FILES_PATH)) { stop(paste("Error: CEL files directory not found:", CEL_FILES_PATH)) } raw_data <- ReadAffy(celfile.path = CEL_FILES_PATH) # --- Probeset Quantification using Iter-plier Algorithm --- # The 'expresso' function performs a complete quantification pipeline, # allowing specification of methods for background correction, normalization, # PM/MM correction, and summarization. # Here, we explicitly use "iterplier" for background correction. # Other common methods are quantile normalization and median polish summarization. message("Performing probeset quantification using Iter-plier background correction...") eset <- expresso(raw_data, bgcorrect.method = "iterplier", normalize.method = "quantiles", pmcorrect.method = "pmonly", # Perfect Match only correction summary.method = "medianpolish") # --- Extract and Save Results --- # Extract the expression matrix from the ExpressionSet object. expression_matrix <- exprs(eset) # Save the quantified probeset data to a CSV file. output_filename <- "probeset_quantification_iterplier.csv" write.csv(expression_matrix, output_filename, row.names = TRUE) message(paste("Probeset quantification complete. Output saved to:", output_filename)) EOF # Execute the R script Rscript quantify_probesets_iterplier.R -
3
probe group file: GPL13185_mjay.pgf
Affymetrix Power Tools (APT) (Inferred with models/gemini-2.5-flash) vLatest (Inferred with models/gemini-2.5-flash)$ Bash example
# Installation of Affymetrix Power Tools (APT) from Thermo Fisher Scientific: # APT is typically downloaded as a binary package from the official website. # Example for Linux: # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux_x86_64.zip # unzip APT_2.11.2_Linux_x86_64.zip # export PATH=$PATH:/path/to/apt/bin # The GPL13185_mjay.pgf file is a probe group file, which defines the probe sets # for the Affymetrix Mouse Gene 1.0 ST Array (GPL13185). # This file is a reference dataset used by Affymetrix Power Tools for data processing. # Ensure the probe group file is accessible in your working directory or specified path. # Example: Copying or linking the reference file if it's stored elsewhere. # cp /path/to/reference_files/GPL13185_mjay.pgf . # Example command: Using apt-probeset-summarize to process CEL files # and generate expression summaries using the specified probe group file. # Replace 'input_cel_files.txt' with a file containing a list of your CEL file paths (one per line). # Replace 'output_dir' with your desired output directory. # This command assumes you have raw Affymetrix .CEL files to process. apt-probeset-summarize --cel-files input_cel_files.txt --probe-group-file GPL13185_mjay.pgf --output-dir output_dir --log-file apt_summarize.log
-
4
Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/.
$ Bash example
# Omniviewer is a web-based genome browser/viewer. # Data analysis with Omniviewer typically involves interactive exploration through its web interface. # There is no direct command-line execution for the 'analysis' step itself. # The following command is a placeholder to indicate the action of viewing data. echo "Data were analyzed interactively using the Omniviewer web interface at http://exon.ucsc.edu/omniviewer/."
-
5
The Omniviewer output is available on the series record.
OmniViewer vNot specified$ Bash example
# This step describes the availability of Omniviewer output on a series record. # OmniViewer is primarily a web-based interactive visualization tool for multi-omics data. # The "output" likely refers to data viewable through the OmniViewer web interface # or a report/export generated by it. # # As there is no direct command-line execution implied for *generating* this output # by a web-based viewer, the following command represents a conceptual way to access # the available output. Replace [OMNIVIEWER_OUTPUT_URL] with the actual URL where the output is hosted. # For Linux/WSL environments: xdg-open "https://omniviewer.example.com/series_record_id/output" # For macOS: # open "https://omniviewer.example.com/series_record_id/output"
-
6
The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.
Data Grouping/Sample Annotation (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# Define sample sets based on description # Set A: Nxf1-CAST mutants # Set B: Nxf1-B6 mutants # This step represents the logical grouping of samples based on experimental design. # A common way to represent this in a pipeline is through a manifest or sample sheet. # Create a placeholder manifest file to categorize samples. # In a real scenario, this file would be generated from experimental metadata. # Example: Assuming sample IDs are available and need to be assigned to sets. # This script creates a CSV file defining the sets. echo "Sample_ID,Set_Name,Genotype" > sample_manifest.csv echo "Nxf1_CAST_Sample_1,A,Nxf1-CAST" >> sample_manifest.csv echo "Nxf1_CAST_Sample_2,A,Nxf1-CAST" >> sample_manifest.csv echo "Nxf1_B6_Sample_1,B,Nxf1-B6" >> sample_manifest.csv echo "Nxf1_B6_Sample_2,B,Nxf1-B6" >> sample_manifest.csv echo "Sample manifest 'sample_manifest.csv' created, defining 'A' and 'B' sets." # No specific reference datasets are directly used in this sample grouping step. # Reference genomes (e.g., GRCh38) would be used in subsequent analysis steps (e.g., alignment, peak calling).
Tools Used
Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. probe group file: GPL13185_mjay.pgf Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/. The Omniviewer output is available on the series record. The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.