GSE73843 Processing Pipeline

GSE code_examples 4 steps

Publication

RNA-binding protein CPEB1 remodels host and viral RNA landscapes.

Nature structural & molecular biology (2016) — PMID 27775709

Dataset

GSE73843

Transcriptome analysis of diverse cell types infected with human cytomegalovirus [array]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vNot specified
    $ Bash example
    # Install Affy Power Tools (APT) if not already installed
    # conda install -c bioconda affy-power-tools
    
    # Example usage of apt-probeset-summarize
    # This command processes CEL files using a specified algorithm (e.g., RMA, MAS5)
    # and a CDF file specific to the array type, outputting a summarized probeset file.
    
    # Create a dummy list of CEL files for demonstration
    # echo "sample1.CEL" > input_cel_files.txt
    # echo "sample2.CEL" >> input_cel_files.txt
    
    # Placeholder for the CDF file specific to your Affymetrix array
    # Replace 'your_array_type.cdf' with the actual CDF file path
    # For example, for Human Gene 1.0 ST arrays, it might be HuGene-1_0-st-v1.cdf
    
    # Make sure the output directory exists
    mkdir -p output_dir
    
    apt-probeset-summarize \
        --cel-files input_cel_files.txt \
        --cdf-file your_array_type.cdf \
        --output-file output_dir/probeset_summary.tsv \
        --probe-set-alg rma \
        --log-file output_dir/apt_log.txt
  2. 2

    Iter-plier algorithm used to quantify probesets.

    affy (R package) v1.78.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install R and Bioconductor if not already installed
    # For Bioconductor (current release, e.g., 3.19 for R 4.3)
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
    # R -e 'BiocManager::install("affy")'
    
    # Example R script to quantify probesets using the PLIER algorithm via the 'affy' package
    # Assuming input CEL files are in a directory named 'cel_files'
    # and output will be written to 'quantified_probesets.tsv'
    
    # Create an R script file
    cat << 'EOF' > run_plier_quantification.R
    library(affy)
    
    # --- Configuration ---
    cel_dir <- "cel_files" # Directory containing your CEL files
    output_file <- "quantified_probesets.tsv" # Desired output file name
    
    # --- Input Validation ---
    if (!dir.exists(cel_dir)) {
      stop(paste("Error: CEL file directory not found:", cel_dir))
    }
    cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
    if (length(cel_files) == 0) {
      stop(paste("Error: No CEL files found in", cel_dir, ". Please ensure files end with .CEL (case-insensitive)."))
    }
    
    cat("Found", length(cel_files), "CEL files.\n")
    
    # --- Read CEL files into an AffyBatch object ---
    # This step might require a specific CDF environment for certain array types.
    # For common arrays, ReadAffy can often infer.
    # If you encounter errors related to CDF, you might need to install a specific
    # Bioconductor annotation package (e.g., 'hgu133plus2.db') and specify it.
    # Example: raw_data <- ReadAffy(filenames = cel_files, cdfname = "hgu133plus2")
    # For this example, we assume ReadAffy can proceed without explicit CDF specification.
    tryCatch({
      raw_data <- ReadAffy(filenames = cel_files)
    }, error = function(e) {
      stop(paste("Error reading CEL files into AffyBatch:", e$message,
                 "\nConsider checking your CEL files or specifying a CDF environment."))
    })
    
    cat("Successfully loaded raw Affymetrix data.\n")
    
    # --- Quantify probesets using the PLIER algorithm (just.plier function) ---
    # The 'just.plier' function from the 'affy' package implements the PLIER algorithm.
    # It performs background correction, normalization, and summarization.
    cat("Performing probeset quantification using the PLIER algorithm...\n")
    eset_plier <- just.plier(raw_data)
    
    # --- Extract expression matrix ---
    expression_matrix <- exprs(eset_plier)
    
    # --- Write results to a TSV file ---
    cat("Writing quantified probesets to:", output_file, "\n")
    write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)
    
    cat("Probeset quantification using PLIER completed successfully.\n")
    EOF
    
    # Execute the R script
    Rscript run_plier_quantification.R
    
    # Clean up the R script (optional)
    # rm run_plier_quantification.R
  3. 3

    As previously described (Huelga et al., 2012).

    (Inferred with models/gemini-2.5-flash) vN/A GitHub
    $ Bash example
    # The specific tool, version, and command cannot be inferred from the generic description 'As previously described (Huelga et al., 2012)' without further context about the assay or step type.
  4. 4

    HJAY_r2.pgf

    HJAY_r2.pgf (Inferred with models/gemini-2.5-flash) vr2
    $ Bash example
    bash
    # This command is a generic placeholder as the step description "HJAY_r2.pgf"
    # does not provide sufficient information to infer specific parameters or
    # the exact nature of the script. The file extension '.pgf' typically refers
    # to Portable Graphics Format, suggesting this might be a plotting script or
    # an internal identifier rather than a standard bioinformatics tool.
    # Replace 'input_data.txt' and 'output_result.txt' with actual file names.
    ./HJAY_r2.pgf --input input_data.txt --output output_result.txt
    

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. As previously described (Huelga et al., 2012).
HJAY_r2.pgf
← Back to Analysis