GSE56504 Processing Pipeline

GSE code_examples 2 steps

Publication

Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.

Acta neuropathologica (2022) — PMID 35778567

Dataset

GSE56504

Loss of nuclear TDP-43 in ALS causes altered expression of splicing machinery and widespread dysregulation of RNA splicing in motor neurons

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    The Partek Genomics Suite was used to normalize (by GC RMA) and then analyse the microarray data following Affymetrix guidelines.

    Microarray vNot specified
    $ Bash example
    bash
    # Partek Genomics Suite is commercial, GUI-based software. 
    # The following represents the conceptual steps performed within the software, 
    # as a direct command-line execution is not typically available.
    
    # Input: Raw Affymetrix .CEL files
    # Output: Normalized expression data, analysis results
    
    # 1. Import raw microarray data (e.g., .CEL files) into Partek Genomics Suite.
    #    This step typically involves selecting the raw data files from a directory 
    #    within the graphical user interface.
    
    # 2. Perform normalization using the GC RMA method.
    #    Within the software, navigate to the normalization options and select "GC RMA".
    #    Ensure Affymetrix guidelines are followed for probe set definition and background correction.
    #    (Conceptual representation of the action, not an actual CLI command):
    #    partek_genomics_suite --action normalize --method GC_RMA --input_files /path/to/affymetrix_cel_files/ --output_normalized_data /path/to/output_normalized_data.txt
    
    # 3. Analyze the normalized microarray data following Affymetrix guidelines.
    #    This step involves various statistical analyses (e.g., ANOVA, t-tests, clustering, PCA)
    #    to identify differentially expressed genes or patterns, using the software's built-in tools.
    #    (Conceptual representation of the action, not an actual CLI command):
    #    partek_genomics_suite --action analyze --input_normalized_data /path/to/output_normalized_data.txt --output_analysis_results /path/to/analysis_results/
    
  2. 2

    Core probesets only were used.

    R (Inferred with models/gemini-2.5-flash) vNot specified GitHub
    $ Bash example
    # Install R and Bioconductor packages if not already present
    # sudo apt-get update
    # sudo apt-get install r-base
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
    # R -e 'BiocManager::install(c("oligo", "AnnotationDbi", "hgu133plus2.db"))' # Example for Affymetrix Human Genome U133 Plus 2.0 Array
    
    # This is a conceptual R script to filter an expression matrix to include only core probesets.
    # Replace 'your_expression_matrix.tsv' with the actual path to your pre-processed expression data.
    # The specific method for identifying "core" probesets depends on the array and annotation source.
    # This example assumes using an Affymetrix chip annotation package (e.g., hgu133plus2.db for Human Genome U133 Plus 2.0 Array).
    # "Core probesets" are typically defined as those with high-confidence annotation, often mapping to a known gene identifier.
    
    Rscript -e '
    library(oligo) # Or affy, depending on raw data format (CEL files) and upstream processing
    library(AnnotationDbi)
    library(hgu133plus2.db) # Placeholder: Example annotation package for Affymetrix Human Genome U133 Plus 2.0 Array
    
    # --- Placeholder: Load your expression data (e.g., already normalized and summarized) ---
    # If starting from CEL files, you would use read.celfiles() and then rma() or gcrma() to get an expression matrix.
    # For this step, let\'s assume you have a matrix of probeset IDs and expression values.
    # Example: expr_data <- read.delim("your_expression_matrix.tsv", row.names = 1)
    # For demonstration, let\'s create a dummy matrix with probeset IDs from the example annotation package.
    dummy_probesets <- head(keys(hgu133plus2.db, keytype="PROBEID"), 100)
    set.seed(123)
    expr_data <- matrix(rnorm(length(dummy_probesets) * 3, mean=7, sd=1), ncol=3)
    rownames(expr_data) <- dummy_probesets
    colnames(expr_data) <- paste0("Sample", 1:3)
    message("Original expression matrix dimensions: ", paste(dim(expr_data), collapse="x"))
    
    # --- Identify core probesets ---
    # The definition of "core" probesets can vary. Often it refers to probesets
    # with a high confidence level of annotation, or those mapping to well-defined genes.
    # For Affymetrix, this information is often in the annotation package.
    # We\'ll use the "ENTREZID" as a proxy for well-annotated probesets, filtering out probesets
    # that do not map to an Entrez Gene ID. This is a common interpretation of "core".
    
    # Get all probeset IDs from the expression data
    all_probes <- rownames(expr_data)
    
    # Map probeset IDs to Entrez Gene IDs
    # This will return a list where each element is a vector of Entrez IDs for a probeset
    probe_to_entrez <- AnnotationDbi::mget(all_probes, hgu133plus2.db::hgu133plus2ENTREZID, ifnotfound=NA)
    
    # Identify probesets that successfully map to at least one Entrez ID (i.e., are "core" in this context)
    # Filter out probesets that return NA or an empty vector
    core_probes <- names(probe_to_entrez)[!sapply(probe_to_entrez, function(x) all(is.na(x)) || length(x) == 0)]
    
    message("Number of core probesets identified: ", length(core_probes))
    
    # --- Filter the expression matrix ---
    filtered_expr_data <- expr_data[core_probes, , drop = FALSE]
    message("Filtered expression matrix dimensions: ", paste(dim(filtered_expr_data), collapse="x"))
    
    # --- Save the filtered data ---
    # write.table(filtered_expr_data, "filtered_core_probesets_expression.tsv", sep="\t", quote=FALSE, col.names=NA)
    '
    

Tools Used

Raw Source Text
The Partek Genomics Suite was used to normalize (by GC RMA) and then analyse the microarray data following Affymetrix guidelines. Core probesets only were used.
← Back to Analysis