GSE66696 Processing Pipeline

GSE code_examples 6 steps

Publication

Nxf1 natural variant E610G is a semi-dominant suppressor of IAP-induced RNA processing defects.

PLoS genetics (2015) — PMID 25835743

Dataset

GSE66696

Whole brain RNA from congenic littermates does not support a general effect of Nxf1 CAST alleles on alternative pre-mRNA processing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray v2.12.0 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Affymetrix Power Tools (APT) is typically downloaded as a binary package from Thermo Fisher Scientific.
    # Installation instructions vary by OS. For example, on Linux:
    # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.12.0_Linux_x64.zip
    # unzip APT_2.12.0_Linux_x64.zip
    # cd APT_2.12.0_Linux_x64
    # export PATH=$PWD/bin:$PATH
    
    # Placeholder for input CEL files, output directory, and array-specific reference files.
    # Replace 'sample*.CEL' with actual input CEL file names.
    # Replace 'HG-U133_Plus_2.cdf' and 'HG-U133_Plus_2' with the appropriate CDF file and chip type for your array.
    INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
    OUTPUT_DIR="apt_summarize_output"
    PROBESET_FILE="HG-U133_Plus_2.cdf" # Example: CDF file for a common Affymetrix array
    CHIP_TYPE="HG-U133_Plus_2" # Example: Chip type corresponding to the array
    
    mkdir -p "${OUTPUT_DIR}"
    
    apt-probeset-summarize \
        -a rma \
        -o "${OUTPUT_DIR}" \
        -p "${PROBESET_FILE}" \
        -c "${CHIP_TYPE}" \
        ${INPUT_CEL_FILES}
  2. 2

    Iter-plier algorithm used to quantify probesets.

    Iter-plier vImplemented in `affy` R package (Bioconductor 3.18) (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install R and Bioconductor if not already present
    # R -e "install.packages('BiocManager')"
    # R -e "BiocManager::install('affy')"
    
    # Create an R script for Iter-plier quantification
    cat << 'EOF' > quantify_probesets_iterplier.R
    # Load the affy package, which implements the Iter-plier algorithm
    library(affy)
    
    # --- Configuration ---
    # Specify the path to your Affymetrix CEL files.
    # IMPORTANT: Replace "./" with the actual directory containing your CEL files.
    CEL_FILES_PATH <- "./" # Example: current directory
    
    # --- Data Loading ---
    # Read Affymetrix CEL files into an AffyBatch object.
    # This object contains the raw probe intensity data.
    # The 'affy' package automatically uses the appropriate CDF (Chip Description File)
    # based on the array type detected in the CEL files.
    message(paste("Loading CEL files from:", CEL_FILES_PATH))
    if (!dir.exists(CEL_FILES_PATH)) {
      stop(paste("Error: CEL files directory not found:", CEL_FILES_PATH))
    }
    raw_data <- ReadAffy(celfile.path = CEL_FILES_PATH)
    
    # --- Probeset Quantification using Iter-plier Algorithm ---
    # The 'expresso' function performs a complete quantification pipeline,
    # allowing specification of methods for background correction, normalization,
    # PM/MM correction, and summarization.
    # Here, we explicitly use "iterplier" for background correction.
    # Other common methods are quantile normalization and median polish summarization.
    message("Performing probeset quantification using Iter-plier background correction...")
    eset <- expresso(raw_data,
                     bgcorrect.method = "iterplier",
                     normalize.method = "quantiles",
                     pmcorrect.method = "pmonly", # Perfect Match only correction
                     summary.method = "medianpolish")
    
    # --- Extract and Save Results ---
    # Extract the expression matrix from the ExpressionSet object.
    expression_matrix <- exprs(eset)
    
    # Save the quantified probeset data to a CSV file.
    output_filename <- "probeset_quantification_iterplier.csv"
    write.csv(expression_matrix, output_filename, row.names = TRUE)
    
    message(paste("Probeset quantification complete. Output saved to:", output_filename))
    EOF
    
    # Execute the R script
    Rscript quantify_probesets_iterplier.R
  3. 3

    probe group file: GPL13185_mjay.pgf

    Affymetrix Power Tools (APT) (Inferred with models/gemini-2.5-flash) vLatest (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Installation of Affymetrix Power Tools (APT) from Thermo Fisher Scientific:
    # APT is typically downloaded as a binary package from the official website.
    # Example for Linux:
    # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux_x86_64.zip
    # unzip APT_2.11.2_Linux_x86_64.zip
    # export PATH=$PATH:/path/to/apt/bin
    
    # The GPL13185_mjay.pgf file is a probe group file, which defines the probe sets
    # for the Affymetrix Mouse Gene 1.0 ST Array (GPL13185).
    # This file is a reference dataset used by Affymetrix Power Tools for data processing.
    # Ensure the probe group file is accessible in your working directory or specified path.
    # Example: Copying or linking the reference file if it's stored elsewhere.
    # cp /path/to/reference_files/GPL13185_mjay.pgf .
    
    # Example command: Using apt-probeset-summarize to process CEL files
    # and generate expression summaries using the specified probe group file.
    # Replace 'input_cel_files.txt' with a file containing a list of your CEL file paths (one per line).
    # Replace 'output_dir' with your desired output directory.
    # This command assumes you have raw Affymetrix .CEL files to process.
    apt-probeset-summarize --cel-files input_cel_files.txt --probe-group-file GPL13185_mjay.pgf --output-dir output_dir --log-file apt_summarize.log
  4. 4

    Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/.

    Omniviewer vN/A GitHub
    $ Bash example
    # Omniviewer is a web-based genome browser/viewer.
    # Data analysis with Omniviewer typically involves interactive exploration through its web interface.
    # There is no direct command-line execution for the 'analysis' step itself.
    # The following command is a placeholder to indicate the action of viewing data.
    
    echo "Data were analyzed interactively using the Omniviewer web interface at http://exon.ucsc.edu/omniviewer/."
  5. 5

    The Omniviewer output is available on the series record.

    OmniViewer vNot specified
    $ Bash example
    # This step describes the availability of Omniviewer output on a series record.
    # OmniViewer is primarily a web-based interactive visualization tool for multi-omics data.
    # The "output" likely refers to data viewable through the OmniViewer web interface
    # or a report/export generated by it.
    #
    # As there is no direct command-line execution implied for *generating* this output
    # by a web-based viewer, the following command represents a conceptual way to access
    # the available output. Replace [OMNIVIEWER_OUTPUT_URL] with the actual URL where the output is hosted.
    # For Linux/WSL environments:
    xdg-open "https://omniviewer.example.com/series_record_id/output"
    # For macOS:
    # open "https://omniviewer.example.com/series_record_id/output"
  6. 6

    The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.

    Data Grouping/Sample Annotation (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # Define sample sets based on description
    # Set A: Nxf1-CAST mutants
    # Set B: Nxf1-B6 mutants
    
    # This step represents the logical grouping of samples based on experimental design.
    # A common way to represent this in a pipeline is through a manifest or sample sheet.
    
    # Create a placeholder manifest file to categorize samples.
    # In a real scenario, this file would be generated from experimental metadata.
    
    # Example: Assuming sample IDs are available and need to be assigned to sets.
    # This script creates a CSV file defining the sets.
    
    echo "Sample_ID,Set_Name,Genotype" > sample_manifest.csv
    echo "Nxf1_CAST_Sample_1,A,Nxf1-CAST" >> sample_manifest.csv
    echo "Nxf1_CAST_Sample_2,A,Nxf1-CAST" >> sample_manifest.csv
    echo "Nxf1_B6_Sample_1,B,Nxf1-B6" >> sample_manifest.csv
    echo "Nxf1_B6_Sample_2,B,Nxf1-B6" >> sample_manifest.csv
    
    echo "Sample manifest 'sample_manifest.csv' created, defining 'A' and 'B' sets."
    
    # No specific reference datasets are directly used in this sample grouping step.
    # Reference genomes (e.g., GRCh38) would be used in subsequent analysis steps (e.g., alignment, peak calling).

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets.
probe group file: GPL13185_mjay.pgf
Data were analyzed with Omniviewer software: http://exon.ucsc.edu/omniviewer/. The Omniviewer output is available on the series record. The "A" set represents the Nxf1-CAST mutants and the "B" set represents the Nxf1-B6 mutants.
← Back to Analysis