GSE86464 Processing Pipeline — Yeo Lab Publications

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray vNot specified (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install Affy Power Tools (APT) via Bioconda
# conda install -c bioconda affy-power-tools

# Example usage of apt-probeset-summarize
# This command processes Affymetrix CEL files to generate summarized probe set data.
# Replace 'path/to/library.cdf', 'path/to/input_cel_file_1.CEL', etc., and 'path/to/output_dir' with actual paths.
# The --analysis parameter specifies the summarization algorithm (e.g., rma, mas5, plier).
# The --output-dir parameter specifies where to write the output files.

apt-probeset-summarize \
    --cdf-file path/to/library.cdf \
    --analysis rma \
    --cel-files path/to/input_cel_file_1.CEL path/to/input_cel_file_2.CEL \
    --output-dir path/to/output_dir

2

Iter-plier algorithm used to quantify probesets.

affy (Inferred with models/gemini-2.5-flash) v1.78.0 GitHub

$ Bash example

# Install R and Bioconductor if not already present
# R -e "install.packages('BiocManager')"
# R -e "BiocManager::install('affy')"

# Create an R script to perform Iter-plier background correction and RMA quantification
cat << 'EOF' > quantify_probesets.R
library(affy)

# Define input and output directories
# Assumes raw Affymetrix .CEL files are located in the 'raw_cel_files' directory.
# Create this directory and place your .CEL files there before running.
input_dir <- "raw_cel_files"
output_dir <- "quantified_results"
dir.create(output_dir, showWarnings = FALSE)

# Check if input directory exists
if (!dir.exists(input_dir)) {
  stop("Input directory '", input_dir, "' not found. Please create it and place .CEL files inside.")
}

# List all .CEL files in the input directory
cel_files <- list.celfiles(path=input_dir, full.names=TRUE)

if (length(cel_files) == 0) {
  stop("No .CEL files found in the input directory: ", input_dir)
}
print(paste("Found", length(cel_files), ".CEL files for quantification."))

# Read the .CEL files into an AffyBatch object
# This step reads raw intensity data from the arrays.
raw_data <- ReadAffy(filenames=cel_files)

# Perform IterPLIER background correction
# The IterPLIER algorithm is used to estimate and subtract background noise.
# This function is part of the 'affy' package.
bg_corrected_affybatch <- bg.correct.iterplier(raw_data)

# Perform RMA (Robust Multi-array Average) normalization and summarization
# RMA is a widely used method for normalizing and summarizing Affymetrix GeneChip data.
# This step quantifies probesets by combining probe intensities into a single expression value per probeset.
eset <- rma(bg_corrected_affybatch)

# Extract the expression matrix (quantified probeset values)
expression_matrix <- exprs(eset)

# Get probeset IDs
probeset_ids <- featureNames(eset)

# Combine probeset IDs with the expression matrix into a data frame
quantified_data <- data.frame(ProbesetID = probeset_ids, expression_matrix)

# Define the output file path
output_file <- file.path(output_dir, "probeset_quantification_iterplier_rma.csv")

# Write the quantified probeset data to a CSV file
write.csv(quantified_data, output_file, row.names = FALSE)

print(paste("Probeset quantification complete. Results saved to:", output_file))
EOF

# Create input directory for CEL files (if it doesn't exist)
mkdir -p raw_cel_files

# Execute the R script to perform quantification
Rscript quantify_probesets.R

# Example of creating dummy CEL files for testing (optional)
# This part is for demonstration if you don't have actual CEL files.
# It will create empty files, which 'ReadAffy' will likely fail on,
# but shows the expected input structure.
# touch raw_cel_files/sample_A.CEL
# touch raw_cel_files/sample_B.CEL

View on GitHub

3

http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf

clipper (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install clipper (if not already installed)
# git clone https://github.com/yeolab/clipper.git
# cd clipper
# python setup.py install # Or just run the script directly

# Example usage for eCLIP peak calling with clipper
# Assuming input BAM files and a control BAM file are available
# Replace with actual file paths and species
INPUT_BAM="input.bam"
CONTROL_BAM="control.bam"
OUTPUT_PREFIX="peaks"
SPECIES="hg38" # Placeholder for latest assembly

# Run clipper
python clipper.py \
    -o "${OUTPUT_PREFIX}.bed" \
    -s "${SPECIES}" \
    -b "${INPUT_BAM}" \
    -c "${CONTROL_BAM}" \
    --bonferroni \
    --fdr 0.05 \
    --window 100 \
    --step 20

View on GitHub

Tools Used

Microarray