GSE86464 Processing Pipeline

GSE code_examples 3 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

GSE86464

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vNot specified (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install Affy Power Tools (APT) via Bioconda
    # conda install -c bioconda affy-power-tools
    
    # Example usage of apt-probeset-summarize
    # This command processes Affymetrix CEL files to generate summarized probe set data.
    # Replace 'path/to/library.cdf', 'path/to/input_cel_file_1.CEL', etc., and 'path/to/output_dir' with actual paths.
    # The --analysis parameter specifies the summarization algorithm (e.g., rma, mas5, plier).
    # The --output-dir parameter specifies where to write the output files.
    
    apt-probeset-summarize \
        --cdf-file path/to/library.cdf \
        --analysis rma \
        --cel-files path/to/input_cel_file_1.CEL path/to/input_cel_file_2.CEL \
        --output-dir path/to/output_dir
  2. 2

    Iter-plier algorithm used to quantify probesets.

    affy (Inferred with models/gemini-2.5-flash) v1.78.0 GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # R -e "install.packages('BiocManager')"
    # R -e "BiocManager::install('affy')"
    
    # Create an R script to perform Iter-plier background correction and RMA quantification
    cat << 'EOF' > quantify_probesets.R
    library(affy)
    
    # Define input and output directories
    # Assumes raw Affymetrix .CEL files are located in the 'raw_cel_files' directory.
    # Create this directory and place your .CEL files there before running.
    input_dir <- "raw_cel_files"
    output_dir <- "quantified_results"
    dir.create(output_dir, showWarnings = FALSE)
    
    # Check if input directory exists
    if (!dir.exists(input_dir)) {
      stop("Input directory '", input_dir, "' not found. Please create it and place .CEL files inside.")
    }
    
    # List all .CEL files in the input directory
    cel_files <- list.celfiles(path=input_dir, full.names=TRUE)
    
    if (length(cel_files) == 0) {
      stop("No .CEL files found in the input directory: ", input_dir)
    }
    print(paste("Found", length(cel_files), ".CEL files for quantification."))
    
    # Read the .CEL files into an AffyBatch object
    # This step reads raw intensity data from the arrays.
    raw_data <- ReadAffy(filenames=cel_files)
    
    # Perform IterPLIER background correction
    # The IterPLIER algorithm is used to estimate and subtract background noise.
    # This function is part of the 'affy' package.
    bg_corrected_affybatch <- bg.correct.iterplier(raw_data)
    
    # Perform RMA (Robust Multi-array Average) normalization and summarization
    # RMA is a widely used method for normalizing and summarizing Affymetrix GeneChip data.
    # This step quantifies probesets by combining probe intensities into a single expression value per probeset.
    eset <- rma(bg_corrected_affybatch)
    
    # Extract the expression matrix (quantified probeset values)
    expression_matrix <- exprs(eset)
    
    # Get probeset IDs
    probeset_ids <- featureNames(eset)
    
    # Combine probeset IDs with the expression matrix into a data frame
    quantified_data <- data.frame(ProbesetID = probeset_ids, expression_matrix)
    
    # Define the output file path
    output_file <- file.path(output_dir, "probeset_quantification_iterplier_rma.csv")
    
    # Write the quantified probeset data to a CSV file
    write.csv(quantified_data, output_file, row.names = FALSE)
    
    print(paste("Probeset quantification complete. Results saved to:", output_file))
    EOF
    
    # Create input directory for CEL files (if it doesn't exist)
    mkdir -p raw_cel_files
    
    # Execute the R script to perform quantification
    Rscript quantify_probesets.R
    
    # Example of creating dummy CEL files for testing (optional)
    # This part is for demonstration if you don't have actual CEL files.
    # It will create empty files, which 'ReadAffy' will likely fail on,
    # but shows the expected input structure.
    # touch raw_cel_files/sample_A.CEL
    # touch raw_cel_files/sample_B.CEL
    
  3. 3

    http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf

    clipper (Inferred with models/gemini-2.5-flash) vNot specified (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install clipper (if not already installed)
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # python setup.py install # Or just run the script directly
    
    # Example usage for eCLIP peak calling with clipper
    # Assuming input BAM files and a control BAM file are available
    # Replace with actual file paths and species
    INPUT_BAM="input.bam"
    CONTROL_BAM="control.bam"
    OUTPUT_PREFIX="peaks"
    SPECIES="hg38" # Placeholder for latest assembly
    
    # Run clipper
    python clipper.py \
        -o "${OUTPUT_PREFIX}.bed" \
        -s "${SPECIES}" \
        -b "${INPUT_BAM}" \
        -c "${CONTROL_BAM}" \
        --bonferroni \
        --fdr 0.05 \
        --window 100 \
        --step 20

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf
← Back to Analysis