GSE73843 Processing Pipeline — Yeo Lab Publications

Publication

RNA-binding protein CPEB1 remodels host and viral RNA landscapes.

Nature structural & molecular biology (2016) — PMID 27775709

Dataset

Transcriptome analysis of diverse cell types infected with human cytomegalovirus [array]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray vNot specified

$ Bash example

# Install Affy Power Tools (APT) if not already installed
# conda install -c bioconda affy-power-tools

# Example usage of apt-probeset-summarize
# This command processes CEL files using a specified algorithm (e.g., RMA, MAS5)
# and a CDF file specific to the array type, outputting a summarized probeset file.

# Create a dummy list of CEL files for demonstration
# echo "sample1.CEL" > input_cel_files.txt
# echo "sample2.CEL" >> input_cel_files.txt

# Placeholder for the CDF file specific to your Affymetrix array
# Replace 'your_array_type.cdf' with the actual CDF file path
# For example, for Human Gene 1.0 ST arrays, it might be HuGene-1_0-st-v1.cdf

# Make sure the output directory exists
mkdir -p output_dir

apt-probeset-summarize \
    --cel-files input_cel_files.txt \
    --cdf-file your_array_type.cdf \
    --output-file output_dir/probeset_summary.tsv \
    --probe-set-alg rma \
    --log-file output_dir/apt_log.txt

2

Iter-plier algorithm used to quantify probesets.

affy (R package) v1.78.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R and Bioconductor if not already installed
# For Bioconductor (current release, e.g., 3.19 for R 4.3)
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")'
# R -e 'BiocManager::install("affy")'

# Example R script to quantify probesets using the PLIER algorithm via the 'affy' package
# Assuming input CEL files are in a directory named 'cel_files'
# and output will be written to 'quantified_probesets.tsv'

# Create an R script file
cat << 'EOF' > run_plier_quantification.R
library(affy)

# --- Configuration ---
cel_dir <- "cel_files" # Directory containing your CEL files
output_file <- "quantified_probesets.tsv" # Desired output file name

# --- Input Validation ---
if (!dir.exists(cel_dir)) {
  stop(paste("Error: CEL file directory not found:", cel_dir))
}
cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE)
if (length(cel_files) == 0) {
  stop(paste("Error: No CEL files found in", cel_dir, ". Please ensure files end with .CEL (case-insensitive)."))
}

cat("Found", length(cel_files), "CEL files.\n")

# --- Read CEL files into an AffyBatch object ---
# This step might require a specific CDF environment for certain array types.
# For common arrays, ReadAffy can often infer.
# If you encounter errors related to CDF, you might need to install a specific
# Bioconductor annotation package (e.g., 'hgu133plus2.db') and specify it.
# Example: raw_data <- ReadAffy(filenames = cel_files, cdfname = "hgu133plus2")
# For this example, we assume ReadAffy can proceed without explicit CDF specification.
tryCatch({
  raw_data <- ReadAffy(filenames = cel_files)
}, error = function(e) {
  stop(paste("Error reading CEL files into AffyBatch:", e$message,
             "\nConsider checking your CEL files or specifying a CDF environment."))
})

cat("Successfully loaded raw Affymetrix data.\n")

# --- Quantify probesets using the PLIER algorithm (just.plier function) ---
# The 'just.plier' function from the 'affy' package implements the PLIER algorithm.
# It performs background correction, normalization, and summarization.
cat("Performing probeset quantification using the PLIER algorithm...\n")
eset_plier <- just.plier(raw_data)

# --- Extract expression matrix ---
expression_matrix <- exprs(eset_plier)

# --- Write results to a TSV file ---
cat("Writing quantified probesets to:", output_file, "\n")
write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE)

cat("Probeset quantification using PLIER completed successfully.\n")
EOF

# Execute the R script
Rscript run_plier_quantification.R

# Clean up the R script (optional)
# rm run_plier_quantification.R

View on GitHub

3

As previously described (Huelga et al., 2012).

(Inferred with models/gemini-2.5-flash) vN/A GitHub

$ Bash example

# The specific tool, version, and command cannot be inferred from the generic description 'As previously described (Huelga et al., 2012)' without further context about the assay or step type.

View on GitHub

4

HJAY_r2.pgf

HJAY_r2.pgf (Inferred with models/gemini-2.5-flash) vr2

$ Bash example

bash
# This command is a generic placeholder as the step description "HJAY_r2.pgf"
# does not provide sufficient information to infer specific parameters or
# the exact nature of the script. The file extension '.pgf' typically refers
# to Portable Graphics Format, suggesting this might be a plotting script or
# an internal identifier rather than a standard bioinformatics tool.
# Replace 'input_data.txt' and 'output_result.txt' with actual file names.
./HJAY_r2.pgf --input input_data.txt --output output_result.txt

Tools Used

Microarray