GSE86224 Processing Pipeline

GSE code_examples 3 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

GSE86224

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_Arrays_…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vInferred with models/gemini-2.5-flash
    $ Bash example
    # Install Affymetrix Power Tools (APT)
    # APT is typically downloaded from the Thermo Fisher Scientific website or installed via a package manager like Bioconda.
    # For example, using Bioconda:
    # conda install -c bioconda affy-power-tools
    
    # Example usage of apt-probeset-summarize
    # This command summarizes probe-level data from CEL files into a probeset-level expression matrix.
    # Replace 'path/to/your/library_file.cdf' with the actual CDF file for your array type (e.g., from the Affymetrix support site).
    # Replace 'input_sample1.CEL input_sample2.CEL' with your actual CEL files.
    # Replace 'output_summary_prefix' with your desired output file prefix.
    apt-probeset-summarize \
      --cdf-file path/to/your/library_file.cdf \
      --out-dir . \
      --log-file apt_probeset_summarize.log \
      --cel-files input_sample1.CEL input_sample2.CEL \
      --output-file output_summary_prefix
  2. 2

    Iter-plier algorithm used to quantify probesets.

    Iter-plier v1.61.0 (R package Bioconductor)
    $ Bash example
    #!/bin/bash
    
    # Define variables
    # Placeholder for input CEL files directory (e.g., containing Affymetrix .CEL files)
    CEL_FILES_DIR="data/raw_cel_files"
    # Placeholder for output directory where quantified probesets will be saved
    OUTPUT_DIR="results/quantification"
    # Name of the R script to be created and executed
    R_SCRIPT="quantify_iter_plier.R"
    # Placeholder for the array annotation package (e.g., 'hgu133plus2.db' for Affymetrix Human Genome U133 Plus 2.0 Array)
    # This package provides probe-level annotations necessary for quantification.
    ARRAY_ANNOTATION_PACKAGE="hgu133plus2.db"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # --- R Package Installation (commented out) ---
    # These commands install the necessary R packages if they are not already present.
    # It's recommended to install BiocManager first, then use it to install Bioconductor packages.
    # R -e 'install.packages("BiocManager", repos="https://cloud.r-project.org")'
    # R -e 'BiocManager::install("iterPli")'
    # R -e 'BiocManager::install("affy")' # Required for reading .CEL files
    # R -e 'BiocManager::install("${ARRAY_ANNOTATION_PACKAGE}")' # Install the specific annotation package
    
    # Create the R script dynamically
    cat <<EOF > "${R_SCRIPT}"
    # Load necessary R packages
    library(iterPli)
    library(affy) # Provides functions to read Affymetrix .CEL files
    library("${ARRAY_ANNOTATION_PACKAGE}", character.only = TRUE) # Load the specified array annotation package
    
    # --- Configuration from environment variables ---
    cel_files_dir <- Sys.getenv("CEL_FILES_DIR")
    output_dir <- Sys.getenv("OUTPUT_DIR")
    array_annotation_package <- Sys.getenv("ARRAY_ANNOTATION_PACKAGE")
    
    # Create output directory if it doesn't exist within the R script context
    if (!dir.exists(output_dir)) {
      dir.create(output_dir, recursive = TRUE)
    }
    
    # List and read .CEL files from the specified directory
    cel_files <- list.files(cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    if (length(cel_files) == 0) {
      stop(paste("Error: No .CEL files found in the specified directory:", cel_files_dir))
    }
    
    # Create an AffyBatch object from the raw .CEL files
    # This object holds the raw intensity data from the microarray experiment.
    raw_data <- ReadAffy(filenames = cel_files)
    
    # Perform Iter-plier quantification
    # The iterPli function processes the raw intensity data to produce robust probeset expression values.
    # It returns an ExpressionSet object, which contains the quantified expression values.
    # Default parameters are used here. Depending on the array type, you might need to specify 'cdfName'
    # (e.g., quantified_data <- iterPli(raw_data, cdfName = "hgu133plus2")) if not automatically inferred or if using custom CDFs.
    quantified_data <- iterPli(raw_data)
    
    # Extract the expression matrix from the ExpressionSet object
    expression_matrix <- exprs(quantified_data)
    
    # Define the output file path
    output_file <- file.path(output_dir, "iter_plier_quantified_probesets.tsv")
    
    # Save the quantified expression matrix to a tab-separated file
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("Iter-plier quantification complete. Results saved to:", output_file))
    EOF
    
    # Execute the R script using Rscript, passing environment variables
    # This ensures the R script can access the paths defined in the bash script.
    CEL_FILES_DIR="${CEL_FILES_DIR}" OUTPUT_DIR="${OUTPUT_DIR}" ARRAY_ANNOTATION_PACKAGE="${ARRAY_ANNOTATION_PACKAGE}" Rscript "${R_SCRIPT}"
    
    echo "Iter-plier quantification pipeline finished successfully."
  3. 3

    http://exon.ucsc.edu/documentation/mjay_library/mjay.pgf

    Unknown (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # The provided description 'http://exon.ucsc.edu/documentation/mjay_library/mjay.pgf' is a URL to a .pgf (Portable Graphics Format) file.
    # This file format typically contains graphical diagrams and does not provide a textual description of a bioinformatics step or tool.
    # Therefore, it is not possible to infer a specific bioinformatics tool, its version, or a relevant bash command from this description.
    # Please provide a textual description of the bioinformatics step for accurate inference.
    #
    # As no specific tool or command can be inferred, a placeholder command is provided to fulfill the output format requirement.
    echo "Error: Cannot infer a specific bioinformatics step or tool from the provided .pgf file URL. Please provide a textual description."

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
http://exon.ucsc.edu/documentation/mjay_library/mjay.pgf
← Back to Analysis