GSE40653 Processing Pipeline

GSE code_examples 2 steps

Publication

Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs.

Nature neuroscience (2012) — PMID 23023293

Dataset

GSE40653

Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vInferred with models/gemini-2.5-flash
    $ Bash example
    # Install Affymetrix Power Tools (APT)
    # conda install -c bioconda affymetrix-power-tools
    
    # Define input and output paths
    # Replace with actual CEL files and the correct CDF file for your array type
    CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" # Placeholder for actual CEL files
    CDF_FILE="HG-U133A.cdf" # Placeholder for the specific array's CDF file (e.g., from Affymetrix support site)
    OUTPUT_DIR="apt_summarize_output"
    ALGORITHM="rma" # Common summarization algorithm (e.g., rma, mas5, dabg)
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Execute apt-probeset-summarize
    apt-probeset-summarize \
        -a "${ALGORITHM}" \
        -o "${OUTPUT_DIR}" \
        -c "${CDF_FILE}" \
        ${CEL_FILES}
  2. 2

    Iter-plier algorithm used to quantify probesets.

    plier (R package) (Inferred with models/gemini-2.5-flash) v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install R and Bioconductor if not already installed
    # sudo apt-get update
    # sudo apt-get install r-base
    # R -e "if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager'); BiocManager::install(c('plier', 'affy'))"
    
    cat << 'EOF' > run_plier.R
    # Load necessary R packages
    library(plier)
    library(affy) # Required for ReadAffy
    
    # Define input CEL files directory and output file
    # IMPORTANT: Replace "path/to/your/raw_cel_files" with the actual directory containing your .CEL files.
    cel_files_dir <- Sys.getenv("CEL_FILES_DIR", "path/to/your/raw_cel_files")
    output_file <- Sys.getenv("OUTPUT_FILE", "probeset_quantification_plier.tsv")
    
    # Check if the CEL files directory exists
    if (!dir.exists(cel_files_dir)) {
      stop(paste("Error: CEL files directory not found:", cel_files_dir,
                 "\nPlease update the 'CEL_FILES_DIR' environment variable to point to your actual .CEL files."))
    }
    
    # Read CEL files into an AffyBatch object
    # This step requires valid Affymetrix .CEL files.
    # Ensure that the appropriate annotation package for your array type is installed
    # if you plan to use it for downstream analysis (e.g., hgu133plus2.db).
    # Example: BiocManager::install("hgu133plus2.db")
    raw_data <- ReadAffy(celfile.path = cel_files_dir)
    
    # Perform PLIER quantification
    # The 'plier' function implements the Iter-plier algorithm.
    eset_plier <- plier(raw_data)
    
    # Extract expression values (log2 transformed)
    expression_matrix <- exprs(eset_plier)
    
    # Write results to a TSV file
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    message(paste("PLIER quantification complete. Results saved to:", output_file))
    EOF
    
    # Set environment variables for the R script
    # IMPORTANT: Replace "path/to/your/raw_cel_files" with the actual directory containing your .CEL files
    export CEL_FILES_DIR="path/to/your/raw_cel_files"
    export OUTPUT_FILE="probeset_quantification_plier.tsv"
    
    # Execute the R script
    Rscript run_plier.R

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
← Back to Analysis