GSE66696 Processing Pipeline

GSE code_examples 6 steps

Publication

Nxf1 natural variant E610G is a semi-dominant suppressor of IAP-induced RNA processing defects.

PLoS genetics (2015) — PMID 25835743

Dataset

Whole brain RNA from congenic littermates does not support a general effect of Nxf1 CAST alleles on alternative pre-mRNA processing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray v2.12.0 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Affymetrix Power Tools (APT) is typically downloaded as a binary package from Thermo Fisher Scientific.
# Installation instructions vary by OS. For example, on Linux:
# wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.12.0_Linux_x64.zip
# unzip APT_2.12.0_Linux_x64.zip
# cd APT_2.12.0_Linux_x64
# export PATH=$PWD/bin:$PATH

# Placeholder for input CEL files, output directory, and array-specific reference files.
# Replace 'sample*.CEL' with actual input CEL file names.
# Replace 'HG-U133_Plus_2.cdf' and 'HG-U133_Plus_2' with the appropriate CDF file and chip type for your array.
INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
OUTPUT_DIR="apt_summarize_output"
PROBESET_FILE="HG-U133_Plus_2.cdf" # Example: CDF file for a common Affymetrix array
CHIP_TYPE="HG-U133_Plus_2" # Example: Chip type corresponding to the array

mkdir -p "${OUTPUT_DIR}"

apt-probeset-summarize \
    -a rma \
    -o "${OUTPUT_DIR}" \
    -p "${PROBESET_FILE}" \
    -c "${CHIP_TYPE}" \
    ${INPUT_CEL_FILES}

Iter-plier algorithm used to quantify probesets.

Iter-plier vImplemented in `affy` R package (Bioconductor 3.18) (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# R -e "install.packages('BiocManager')"
# R -e "BiocManager::install('affy')"

# Create an R script for Iter-plier quantification
cat << 'EOF' > quantify_probesets_iterplier.R
# Load the affy package, which implements the Iter-plier algorithm
library(affy)

# --- Configuration ---
# Specify the path to your Affymetrix CEL files.
# IMPORTANT: Replace "./" with the actual directory containing your CEL files.
CEL_FILES_PATH <- "./" # Example: current directory

# --- Data Loading ---
# Read Affymetrix CEL files into an AffyBatch object.
# This object contains the raw probe intensity data.
# The 'affy' package automatically uses the appropriate CDF (Chip Description File)
# based on the array type detected in the CEL files.
message(paste("Loading CEL files from:", CEL_FILES_PATH))
if (!dir.exists(CEL_FILES_PATH)) {
  stop(paste("Error: CEL files directory not found:", CEL_FILES_PATH))
}
raw_data <- ReadAffy(celfile.path = CEL_FILES_PATH)

# --- Probeset Quantification using Iter-plier Algorithm ---
# The 'expresso' function performs a complete quantification pipeline,
# allowing specification of methods for background correction, normalization,
# PM/MM correction, and summarization.
# Here, we explicitly use "iterplier" for background correction.
# Other common methods are quantile normalization and median polish summarization.
message("Performing probeset quantification using Iter-plier background correction...")
eset <- expresso(raw_data,
                 bgcorrect.method = "iterplier",
                 normalize.method = "quantiles",
                 pmcorrect.method = "pmonly", # Perfect Match only correction
                 summary.method = "medianpolish")

# --- Extract and Save Results ---
# Extract the expression matrix from the ExpressionSet object.
expression_matrix <- exprs(eset)

# Save the quantified probeset data to a CSV file.
output_filename <- "probeset_quantification_iterplier.csv"
write.csv(expression_matrix, output_filename, row.names = TRUE)

message(paste("Probeset quantification complete. Output saved to:", output_filename))
EOF

# Execute the R script
Rscript quantify_probesets_iterplier.R

View on GitHub

probe group file: GPL13185_mjay.pgf

Affymetrix Power Tools (APT) (Inferred with models/gemini-2.5-flash) vLatest (Inferred with models/gemini-2.5-flash)

$ Bash example

# Installation of Affymetrix Power Tools (APT) from Thermo Fisher Scientific:
# APT is typically downloaded as a binary package from the official website.
# Example for Linux:
# wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux_x86_64.zip
# unzip APT_2.11.2_Linux_x86_64.zip
# export PATH=$PATH:/path/to/apt/bin

# The GPL13185_mjay.pgf file is a probe group file, which defines the probe sets
# for the Affymetrix Mouse Gene 1.0 ST Array (GPL13185).
# This file is a reference dataset used by Affymetrix Power Tools for data processing.
# Ensure the probe group file is accessible in your working directory or specified path.
# Example: Copying or linking the reference file if it's stored elsewhere.
# cp /path/to/reference_files/GPL13185_mjay.pgf .

# Example command: Using apt-probeset-summarize to process CEL files
# and generate expression summaries using the specified probe group file.
# Replace 'input_cel_files.txt' with a file containing a list of your CEL file paths (one per line).
# Replace 'output_dir' with your desired output directory.
# This command assumes you have raw Affymetrix .CEL files to process.
apt-probeset-summarize --cel-files input_cel_files.txt --probe-group-file GPL13185_mjay.pgf --output-dir output_dir --log-file apt_summarize.log

Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/.

Omniviewer vN/A GitHub

$ Bash example

# Omniviewer is a web-based genome browser/viewer.
# Data analysis with Omniviewer typically involves interactive exploration through its web interface.
# There is no direct command-line execution for the 'analysis' step itself.
# The following command is a placeholder to indicate the action of viewing data.

echo "Data were analyzed interactively using the Omniviewer web interface at http://exon.ucsc.edu/omniviewer/."

View on GitHub

The Omniviewer output is available on the series record.

OmniViewer vNot specified

$ Bash example

# This step describes the availability of Omniviewer output on a series record.
# OmniViewer is primarily a web-based interactive visualization tool for multi-omics data.
# The "output" likely refers to data viewable through the OmniViewer web interface
# or a report/export generated by it.
#
# As there is no direct command-line execution implied for *generating* this output
# by a web-based viewer, the following command represents a conceptual way to access
# the available output. Replace [OMNIVIEWER_OUTPUT_URL] with the actual URL where the output is hosted.
# For Linux/WSL environments:
xdg-open "https://omniviewer.example.com/series_record_id/output"
# For macOS:
# open "https://omniviewer.example.com/series_record_id/output"

The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.

Data Grouping/Sample Annotation (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# Define sample sets based on description
# Set A: Nxf1-CAST mutants
# Set B: Nxf1-B6 mutants

# This step represents the logical grouping of samples based on experimental design.
# A common way to represent this in a pipeline is through a manifest or sample sheet.

# Create a placeholder manifest file to categorize samples.
# In a real scenario, this file would be generated from experimental metadata.

# Example: Assuming sample IDs are available and need to be assigned to sets.
# This script creates a CSV file defining the sets.

echo "Sample_ID,Set_Name,Genotype" > sample_manifest.csv
echo "Nxf1_CAST_Sample_1,A,Nxf1-CAST" >> sample_manifest.csv
echo "Nxf1_CAST_Sample_2,A,Nxf1-CAST" >> sample_manifest.csv
echo "Nxf1_B6_Sample_1,B,Nxf1-B6" >> sample_manifest.csv
echo "Nxf1_B6_Sample_2,B,Nxf1-B6" >> sample_manifest.csv

echo "Sample manifest 'sample_manifest.csv' created, defining 'A' and 'B' sets."

# No specific reference datasets are directly used in this sample grouping step.
# Reference genomes (e.g., GRCh38) would be used in subsequent analysis steps (e.g., alignment, peak calling).

Tools Used

Microarray

Raw Source Text

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
probe group file: GPL13185_mjay.pgf
Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/. The Omniviewer output is available on the series record. The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.

← Back to Analysis