GSE86038 Processing Pipeline — Yeo Lab Publications

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_Arrays_…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray vNot specified

$ Bash example

# Install Affymetrix Power Tools (APT)
# APT is typically downloaded as a binary package from the Thermo Fisher Scientific website.
# Example installation (may vary based on OS and APT version):
# wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.x.x_Linux.zip
# unzip APT_2.x.x_Linux.zip
# export PATH=$PATH:/path/to/APT_binaries

# Placeholder variables
# Replace with actual CEL files. For multiple files, list them space-separated.
# Example: INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
# Alternatively, for a long list, create a file (e.g., cel_list.txt) with one CEL file path per line
# and use --cel-files-file cel_list.txt instead of --cel-files.
INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
# Replace with the appropriate library file (e.g., CDF, PGF) for your array type
PROBESET_LIBRARY_FILE="HG-U133_Plus_2.cdf"
OUTPUT_DIR="apt_summarize_output"
# Common algorithms: rma, mas5, plier, etc. Choose based on experimental design.
ALGORITHM="rma"

mkdir -p "${OUTPUT_DIR}"

apt-probeset-summarize \
  --cel-files ${INPUT_CEL_FILES} \
  --cdf-file "${PROBESET_LIBRARY_FILE}" \
  --output-dir "${OUTPUT_DIR}" \
  --analysis-algorithm "${ALGORITHM}" \
  --log-file "${OUTPUT_DIR}/apt_summarize.log"

2

Iter-plier algorithm used to quantify probesets.

plier (Inferred with models/gemini-2.5-flash) vBioconductor (e.g., 1.78.0 for R 4.3) GitHub

$ Bash example

# Install R and Bioconductor if not already present
# For example, using conda:
# conda create -n r_env r-base bioconductor-affy bioconductor-plier
# conda activate r_env

# Create an R script to perform Iter-PLIER quantification
cat << 'EOF' > quantify_probesets.R
# Load necessary Bioconductor packages
library(affy)
library(plier)

# Define input and output paths
# Assuming CEL files are in a directory named 'cel_files'
cel_dir <- "cel_files"
output_file <- "probeset_quantification_iterplier.tsv"

# Get list of CEL files
cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)

if (length(cel_files) == 0) {
  stop("No CEL files found in the specified directory: ", cel_dir)
}

# Read CEL files into an AffyBatch object
# This step automatically handles the appropriate CDF (Chip Description File)
# for the array type if available in Bioconductor annotation packages.
raw_data <- ReadAffy(filenames = cel_files)

# Perform PLIER quantification, which is an iterative algorithm (Iter-PLIER).
# The 'plier' function from the 'plier' package implements this algorithm.
es_plier <- plier(raw_data)

# Extract expression matrix
expression_matrix <- exprs(es_plier)

# Write results to a TSV file
write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

message(paste("Probeset quantification completed. Results saved to:", output_file))
EOF

# Execute the R script
Rscript quantify_probesets.R

# Example directory structure for input CEL files:
# mkdir -p cel_files
# touch cel_files/sample1.CEL
# touch cel_files/sample2.CEL

View on GitHub

3

http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf

clipper (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install clipper (if not already installed)
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# pip install .

# Placeholder for genome reference files (e.g., hg38)
# mkdir -p reference_data
# wget -P reference_data http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
# gunzip reference_data/hg38.fa.gz
# samtools faidx reference_data/hg38.fa

# Define input files (placeholders)
IP_BAM="ip_sample.bam"
CONTROL_BAM="control_sample.bam"
GENOME_FASTA="reference_data/hg38.fa"
OUTPUT_PREFIX="eclip_peaks"

# Run clipper for peak calling
python clipper.py -b "${IP_BAM}" -c "${CONTROL_BAM}" -g "${GENOME_FASTA}" -o "${OUTPUT_PREFIX}"

View on GitHub

Tools Used

Microarray