GSE56504 Processing Pipeline
GSE
code_examples
2 steps
Publication
Aberrant NOVA1 function disrupts alternative splicing in early stages of amyotrophic lateral sclerosis.Acta neuropathologica (2022) — PMID 35778567
Dataset
GSE56504Loss of nuclear TDP-43 in ALS causes altered expression of splicing machinery and widespread dysregulation of RNA splicing in motor neurons
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
The Partek Genomics Suite was used to normalize (by GC RMA) and then analyse the microarray data following Affymetrix guidelines.
Microarray vNot specified$ Bash example
bash # Partek Genomics Suite is commercial, GUI-based software. # The following represents the conceptual steps performed within the software, # as a direct command-line execution is not typically available. # Input: Raw Affymetrix .CEL files # Output: Normalized expression data, analysis results # 1. Import raw microarray data (e.g., .CEL files) into Partek Genomics Suite. # This step typically involves selecting the raw data files from a directory # within the graphical user interface. # 2. Perform normalization using the GC RMA method. # Within the software, navigate to the normalization options and select "GC RMA". # Ensure Affymetrix guidelines are followed for probe set definition and background correction. # (Conceptual representation of the action, not an actual CLI command): # partek_genomics_suite --action normalize --method GC_RMA --input_files /path/to/affymetrix_cel_files/ --output_normalized_data /path/to/output_normalized_data.txt # 3. Analyze the normalized microarray data following Affymetrix guidelines. # This step involves various statistical analyses (e.g., ANOVA, t-tests, clustering, PCA) # to identify differentially expressed genes or patterns, using the software's built-in tools. # (Conceptual representation of the action, not an actual CLI command): # partek_genomics_suite --action analyze --input_normalized_data /path/to/output_normalized_data.txt --output_analysis_results /path/to/analysis_results/
-
2
Core probesets only were used.
$ Bash example
# Install R and Bioconductor packages if not already present # sudo apt-get update # sudo apt-get install r-base # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' # R -e 'BiocManager::install(c("oligo", "AnnotationDbi", "hgu133plus2.db"))' # Example for Affymetrix Human Genome U133 Plus 2.0 Array # This is a conceptual R script to filter an expression matrix to include only core probesets. # Replace 'your_expression_matrix.tsv' with the actual path to your pre-processed expression data. # The specific method for identifying "core" probesets depends on the array and annotation source. # This example assumes using an Affymetrix chip annotation package (e.g., hgu133plus2.db for Human Genome U133 Plus 2.0 Array). # "Core probesets" are typically defined as those with high-confidence annotation, often mapping to a known gene identifier. Rscript -e ' library(oligo) # Or affy, depending on raw data format (CEL files) and upstream processing library(AnnotationDbi) library(hgu133plus2.db) # Placeholder: Example annotation package for Affymetrix Human Genome U133 Plus 2.0 Array # --- Placeholder: Load your expression data (e.g., already normalized and summarized) --- # If starting from CEL files, you would use read.celfiles() and then rma() or gcrma() to get an expression matrix. # For this step, let\'s assume you have a matrix of probeset IDs and expression values. # Example: expr_data <- read.delim("your_expression_matrix.tsv", row.names = 1) # For demonstration, let\'s create a dummy matrix with probeset IDs from the example annotation package. dummy_probesets <- head(keys(hgu133plus2.db, keytype="PROBEID"), 100) set.seed(123) expr_data <- matrix(rnorm(length(dummy_probesets) * 3, mean=7, sd=1), ncol=3) rownames(expr_data) <- dummy_probesets colnames(expr_data) <- paste0("Sample", 1:3) message("Original expression matrix dimensions: ", paste(dim(expr_data), collapse="x")) # --- Identify core probesets --- # The definition of "core" probesets can vary. Often it refers to probesets # with a high confidence level of annotation, or those mapping to well-defined genes. # For Affymetrix, this information is often in the annotation package. # We\'ll use the "ENTREZID" as a proxy for well-annotated probesets, filtering out probesets # that do not map to an Entrez Gene ID. This is a common interpretation of "core". # Get all probeset IDs from the expression data all_probes <- rownames(expr_data) # Map probeset IDs to Entrez Gene IDs # This will return a list where each element is a vector of Entrez IDs for a probeset probe_to_entrez <- AnnotationDbi::mget(all_probes, hgu133plus2.db::hgu133plus2ENTREZID, ifnotfound=NA) # Identify probesets that successfully map to at least one Entrez ID (i.e., are "core" in this context) # Filter out probesets that return NA or an empty vector core_probes <- names(probe_to_entrez)[!sapply(probe_to_entrez, function(x) all(is.na(x)) || length(x) == 0)] message("Number of core probesets identified: ", length(core_probes)) # --- Filter the expression matrix --- filtered_expr_data <- expr_data[core_probes, , drop = FALSE] message("Filtered expression matrix dimensions: ", paste(dim(filtered_expr_data), collapse="x")) # --- Save the filtered data --- # write.table(filtered_expr_data, "filtered_core_probesets_expression.tsv", sep="\t", quote=FALSE, col.names=NA) '
Tools Used
Raw Source Text
The Partek Genomics Suite was used to normalize (by GC RMA) and then analyse the microarray data following Affymetrix guidelines. Core probesets only were used.