GSE86038 Processing Pipeline

GSE code_examples 3 steps

Publication

Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.

Neuron (2016) — PMID 27773581

Dataset

GSE86038

HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_Arrays_…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vNot specified
    $ Bash example
    # Install Affymetrix Power Tools (APT)
    # APT is typically downloaded as a binary package from the Thermo Fisher Scientific website.
    # Example installation (may vary based on OS and APT version):
    # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.x.x_Linux.zip
    # unzip APT_2.x.x_Linux.zip
    # export PATH=$PATH:/path/to/APT_binaries
    
    # Placeholder variables
    # Replace with actual CEL files. For multiple files, list them space-separated.
    # Example: INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
    # Alternatively, for a long list, create a file (e.g., cel_list.txt) with one CEL file path per line
    # and use --cel-files-file cel_list.txt instead of --cel-files.
    INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
    # Replace with the appropriate library file (e.g., CDF, PGF) for your array type
    PROBESET_LIBRARY_FILE="HG-U133_Plus_2.cdf"
    OUTPUT_DIR="apt_summarize_output"
    # Common algorithms: rma, mas5, plier, etc. Choose based on experimental design.
    ALGORITHM="rma"
    
    mkdir -p "${OUTPUT_DIR}"
    
    apt-probeset-summarize \
      --cel-files ${INPUT_CEL_FILES} \
      --cdf-file "${PROBESET_LIBRARY_FILE}" \
      --output-dir "${OUTPUT_DIR}" \
      --analysis-algorithm "${ALGORITHM}" \
      --log-file "${OUTPUT_DIR}/apt_summarize.log"
  2. 2

    Iter-plier algorithm used to quantify probesets.

    plier (Inferred with models/gemini-2.5-flash) vBioconductor (e.g., 1.78.0 for R 4.3) GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # For example, using conda:
    # conda create -n r_env r-base bioconductor-affy bioconductor-plier
    # conda activate r_env
    
    # Create an R script to perform Iter-PLIER quantification
    cat << 'EOF' > quantify_probesets.R
    # Load necessary Bioconductor packages
    library(affy)
    library(plier)
    
    # Define input and output paths
    # Assuming CEL files are in a directory named 'cel_files'
    cel_dir <- "cel_files"
    output_file <- "probeset_quantification_iterplier.tsv"
    
    # Get list of CEL files
    cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    
    if (length(cel_files) == 0) {
      stop("No CEL files found in the specified directory: ", cel_dir)
    }
    
    # Read CEL files into an AffyBatch object
    # This step automatically handles the appropriate CDF (Chip Description File)
    # for the array type if available in Bioconductor annotation packages.
    raw_data <- ReadAffy(filenames = cel_files)
    
    # Perform PLIER quantification, which is an iterative algorithm (Iter-PLIER).
    # The 'plier' function from the 'plier' package implements this algorithm.
    es_plier <- plier(raw_data)
    
    # Extract expression matrix
    expression_matrix <- exprs(es_plier)
    
    # Write results to a TSV file
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("Probeset quantification completed. Results saved to:", output_file))
    EOF
    
    # Execute the R script
    Rscript quantify_probesets.R
    
    # Example directory structure for input CEL files:
    # mkdir -p cel_files
    # touch cel_files/sample1.CEL
    # touch cel_files/sample2.CEL
  3. 3

    http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf

    clipper (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install clipper (if not already installed)
    # git clone https://github.com/yeolab/clipper.git
    # cd clipper
    # pip install .
    
    # Placeholder for genome reference files (e.g., hg38)
    # mkdir -p reference_data
    # wget -P reference_data http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
    # gunzip reference_data/hg38.fa.gz
    # samtools faidx reference_data/hg38.fa
    
    # Define input files (placeholders)
    IP_BAM="ip_sample.bam"
    CONTROL_BAM="control_sample.bam"
    GENOME_FASTA="reference_data/hg38.fa"
    OUTPUT_PREFIX="eclip_peaks"
    
    # Run clipper for peak calling
    python clipper.py -b "${IP_BAM}" -c "${CONTROL_BAM}" -g "${GENOME_FASTA}" -o "${OUTPUT_PREFIX}"

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf
← Back to Analysis