GSE56504 Processing Pipeline — Yeo Lab Publications

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

Loss of nuclear TDP-43 in ALS causes altered expression of splicing machinery and widespread dysregulation of RNA splicing in motor neurons

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

The Partek Genomics Suite was used to normalize (by GC RMA) and then analyse the microarray data following Affymetrix guidelines.

Microarray vNot specified

$ Bash example

bash
# Partek Genomics Suite is commercial, GUI-based software. 
# The following represents the conceptual steps performed within the software, 
# as a direct command-line execution is not typically available.

# Input: Raw Affymetrix .CEL files
# Output: Normalized expression data, analysis results

# 1. Import raw microarray data (e.g., .CEL files) into Partek Genomics Suite.
#    This step typically involves selecting the raw data files from a directory 
#    within the graphical user interface.

# 2. Perform normalization using the GC RMA method.
#    Within the software, navigate to the normalization options and select "GC RMA".
#    Ensure Affymetrix guidelines are followed for probe set definition and background correction.
#    (Conceptual representation of the action, not an actual CLI command):
#    partek_genomics_suite --action normalize --method GC_RMA --input_files /path/to/affymetrix_cel_files/ --output_normalized_data /path/to/output_normalized_data.txt

# 3. Analyze the normalized microarray data following Affymetrix guidelines.
#    This step involves various statistical analyses (e.g., ANOVA, t-tests, clustering, PCA)
#    to identify differentially expressed genes or patterns, using the software's built-in tools.
#    (Conceptual representation of the action, not an actual CLI command):
#    partek_genomics_suite --action analyze --input_normalized_data /path/to/output_normalized_data.txt --output_analysis_results /path/to/analysis_results/

2

Core probesets only were used.

R (Inferred with models/gemini-2.5-flash) vNot specified GitHub

$ Bash example

# Install R and Bioconductor packages if not already present
# sudo apt-get update
# sudo apt-get install r-base
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install(c("oligo", "AnnotationDbi", "hgu133plus2.db"))' # Example for Affymetrix Human Genome U133 Plus 2.0 Array

# This is a conceptual R script to filter an expression matrix to include only core probesets.
# Replace 'your_expression_matrix.tsv' with the actual path to your pre-processed expression data.
# The specific method for identifying "core" probesets depends on the array and annotation source.
# This example assumes using an Affymetrix chip annotation package (e.g., hgu133plus2.db for Human Genome U133 Plus 2.0 Array).
# "Core probesets" are typically defined as those with high-confidence annotation, often mapping to a known gene identifier.

Rscript -e '
library(oligo) # Or affy, depending on raw data format (CEL files) and upstream processing
library(AnnotationDbi)
library(hgu133plus2.db) # Placeholder: Example annotation package for Affymetrix Human Genome U133 Plus 2.0 Array

# --- Placeholder: Load your expression data (e.g., already normalized and summarized) ---
# If starting from CEL files, you would use read.celfiles() and then rma() or gcrma() to get an expression matrix.
# For this step, let\'s assume you have a matrix of probeset IDs and expression values.
# Example: expr_data <- read.delim("your_expression_matrix.tsv", row.names = 1)
# For demonstration, let\'s create a dummy matrix with probeset IDs from the example annotation package.
dummy_probesets <- head(keys(hgu133plus2.db, keytype="PROBEID"), 100)
set.seed(123)
expr_data <- matrix(rnorm(length(dummy_probesets) * 3, mean=7, sd=1), ncol=3)
rownames(expr_data) <- dummy_probesets
colnames(expr_data) <- paste0("Sample", 1:3)
message("Original expression matrix dimensions: ", paste(dim(expr_data), collapse="x"))

# --- Identify core probesets ---
# The definition of "core" probesets can vary. Often it refers to probesets
# with a high confidence level of annotation, or those mapping to well-defined genes.
# For Affymetrix, this information is often in the annotation package.
# We\'ll use the "ENTREZID" as a proxy for well-annotated probesets, filtering out probesets
# that do not map to an Entrez Gene ID. This is a common interpretation of "core".

# Get all probeset IDs from the expression data
all_probes <- rownames(expr_data)

# Map probeset IDs to Entrez Gene IDs
# This will return a list where each element is a vector of Entrez IDs for a probeset
probe_to_entrez <- AnnotationDbi::mget(all_probes, hgu133plus2.db::hgu133plus2ENTREZID, ifnotfound=NA)

# Identify probesets that successfully map to at least one Entrez ID (i.e., are "core" in this context)
# Filter out probesets that return NA or an empty vector
core_probes <- names(probe_to_entrez)[!sapply(probe_to_entrez, function(x) all(is.na(x)) || length(x) == 0)]

message("Number of core probesets identified: ", length(core_probes))

# --- Filter the expression matrix ---
filtered_expr_data <- expr_data[core_probes, , drop = FALSE]
message("Filtered expression matrix dimensions: ", paste(dim(filtered_expr_data), collapse="x"))

# --- Save the filtered data ---
# write.table(filtered_expr_data, "filtered_core_probesets_expression.tsv", sep="\t", quote=FALSE, col.names=NA)
'

View on GitHub

Tools Used

Microarray