GSE86224 Processing Pipeline — Yeo Lab Publications

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_Arrays_…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray vInferred with models/gemini-2.5-flash

$ Bash example

# Install Affymetrix Power Tools (APT)
# APT is typically downloaded from the Thermo Fisher Scientific website or installed via a package manager like Bioconda.
# For example, using Bioconda:
# conda install -c bioconda affy-power-tools

# Example usage of apt-probeset-summarize
# This command summarizes probe-level data from CEL files into a probeset-level expression matrix.
# Replace 'path/to/your/library_file.cdf' with the actual CDF file for your array type (e.g., from the Affymetrix support site).
# Replace 'input_sample1.CEL input_sample2.CEL' with your actual CEL files.
# Replace 'output_summary_prefix' with your desired output file prefix.
apt-probeset-summarize \
  --cdf-file path/to/your/library_file.cdf \
  --out-dir . \
  --log-file apt_probeset_summarize.log \
  --cel-files input_sample1.CEL input_sample2.CEL \
  --output-file output_summary_prefix

2

Iter-plier algorithm used to quantify probesets.

Iter-plier v1.61.0 (R package Bioconductor)

$ Bash example

#!/bin/bash

# Define variables
# Placeholder for input CEL files directory (e.g., containing Affymetrix .CEL files)
CEL_FILES_DIR="data/raw_cel_files"
# Placeholder for output directory where quantified probesets will be saved
OUTPUT_DIR="results/quantification"
# Name of the R script to be created and executed
R_SCRIPT="quantify_iter_plier.R"
# Placeholder for the array annotation package (e.g., 'hgu133plus2.db' for Affymetrix Human Genome U133 Plus 2.0 Array)
# This package provides probe-level annotations necessary for quantification.
ARRAY_ANNOTATION_PACKAGE="hgu133plus2.db"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# --- R Package Installation (commented out) ---
# These commands install the necessary R packages if they are not already present.
# It's recommended to install BiocManager first, then use it to install Bioconductor packages.
# R -e 'install.packages("BiocManager", repos="https://cloud.r-project.org")'
# R -e 'BiocManager::install("iterPli")'
# R -e 'BiocManager::install("affy")' # Required for reading .CEL files
# R -e 'BiocManager::install("${ARRAY_ANNOTATION_PACKAGE}")' # Install the specific annotation package

# Create the R script dynamically
cat <<EOF > "${R_SCRIPT}"
# Load necessary R packages
library(iterPli)
library(affy) # Provides functions to read Affymetrix .CEL files
library("${ARRAY_ANNOTATION_PACKAGE}", character.only = TRUE) # Load the specified array annotation package

# --- Configuration from environment variables ---
cel_files_dir <- Sys.getenv("CEL_FILES_DIR")
output_dir <- Sys.getenv("OUTPUT_DIR")
array_annotation_package <- Sys.getenv("ARRAY_ANNOTATION_PACKAGE")

# Create output directory if it doesn't exist within the R script context
if (!dir.exists(output_dir)) {
  dir.create(output_dir, recursive = TRUE)
}

# List and read .CEL files from the specified directory
cel_files <- list.files(cel_files_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
if (length(cel_files) == 0) {
  stop(paste("Error: No .CEL files found in the specified directory:", cel_files_dir))
}

# Create an AffyBatch object from the raw .CEL files
# This object holds the raw intensity data from the microarray experiment.
raw_data <- ReadAffy(filenames = cel_files)

# Perform Iter-plier quantification
# The iterPli function processes the raw intensity data to produce robust probeset expression values.
# It returns an ExpressionSet object, which contains the quantified expression values.
# Default parameters are used here. Depending on the array type, you might need to specify 'cdfName'
# (e.g., quantified_data <- iterPli(raw_data, cdfName = "hgu133plus2")) if not automatically inferred or if using custom CDFs.
quantified_data <- iterPli(raw_data)

# Extract the expression matrix from the ExpressionSet object
expression_matrix <- exprs(quantified_data)

# Define the output file path
output_file <- file.path(output_dir, "iter_plier_quantified_probesets.tsv")

# Save the quantified expression matrix to a tab-separated file
write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

message(paste("Iter-plier quantification complete. Results saved to:", output_file))
EOF

# Execute the R script using Rscript, passing environment variables
# This ensures the R script can access the paths defined in the bash script.
CEL_FILES_DIR="${CEL_FILES_DIR}" OUTPUT_DIR="${OUTPUT_DIR}" ARRAY_ANNOTATION_PACKAGE="${ARRAY_ANNOTATION_PACKAGE}" Rscript "${R_SCRIPT}"

echo "Iter-plier quantification pipeline finished successfully."

3

http://exon.ucsc.edu/documentation/mjay_library/mjay.pgf

Unknown (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# The provided description 'http://exon.ucsc.edu/documentation/mjay_library/mjay.pgf' is a URL to a .pgf (Portable Graphics Format) file.
# This file format typically contains graphical diagrams and does not provide a textual description of a bioinformatics step or tool.
# Therefore, it is not possible to infer a specific bioinformatics tool, its version, or a relevant bash command from this description.
# Please provide a textual description of the bioinformatics step for accurate inference.
#
# As no specific tool or command can be inferred, a placeholder command is provided to fulfill the output format requirement.
echo "Error: Cannot infer a specific bioinformatics step or tool from the provided .pgf file URL. Please provide a textual description."

Tools Used

Microarray