GSE73843 Processing Pipeline
GSE
code_examples
4 steps
Publication
RNA-binding protein CPEB1 remodels host and viral RNA landscapes.Nature structural & molecular biology (2016) — PMID 27775709
Dataset
GSE73843Transcriptome analysis of diverse cell types infected with human cytomegalovirus [array]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.
Microarray vNot specified$ Bash example
# Install Affy Power Tools (APT) if not already installed # conda install -c bioconda affy-power-tools # Example usage of apt-probeset-summarize # This command processes CEL files using a specified algorithm (e.g., RMA, MAS5) # and a CDF file specific to the array type, outputting a summarized probeset file. # Create a dummy list of CEL files for demonstration # echo "sample1.CEL" > input_cel_files.txt # echo "sample2.CEL" >> input_cel_files.txt # Placeholder for the CDF file specific to your Affymetrix array # Replace 'your_array_type.cdf' with the actual CDF file path # For example, for Human Gene 1.0 ST arrays, it might be HuGene-1_0-st-v1.cdf # Make sure the output directory exists mkdir -p output_dir apt-probeset-summarize \ --cel-files input_cel_files.txt \ --cdf-file your_array_type.cdf \ --output-file output_dir/probeset_summary.tsv \ --probe-set-alg rma \ --log-file output_dir/apt_log.txt -
2
Iter-plier algorithm used to quantify probesets.
$ Bash example
# Install R and Bioconductor if not already installed # For Bioconductor (current release, e.g., 3.19 for R 4.3) # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' # R -e 'BiocManager::install("affy")' # Example R script to quantify probesets using the PLIER algorithm via the 'affy' package # Assuming input CEL files are in a directory named 'cel_files' # and output will be written to 'quantified_probesets.tsv' # Create an R script file cat << 'EOF' > run_plier_quantification.R library(affy) # --- Configuration --- cel_dir <- "cel_files" # Directory containing your CEL files output_file <- "quantified_probesets.tsv" # Desired output file name # --- Input Validation --- if (!dir.exists(cel_dir)) { stop(paste("Error: CEL file directory not found:", cel_dir)) } cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE) if (length(cel_files) == 0) { stop(paste("Error: No CEL files found in", cel_dir, ". Please ensure files end with .CEL (case-insensitive).")) } cat("Found", length(cel_files), "CEL files.\n") # --- Read CEL files into an AffyBatch object --- # This step might require a specific CDF environment for certain array types. # For common arrays, ReadAffy can often infer. # If you encounter errors related to CDF, you might need to install a specific # Bioconductor annotation package (e.g., 'hgu133plus2.db') and specify it. # Example: raw_data <- ReadAffy(filenames = cel_files, cdfname = "hgu133plus2") # For this example, we assume ReadAffy can proceed without explicit CDF specification. tryCatch({ raw_data <- ReadAffy(filenames = cel_files) }, error = function(e) { stop(paste("Error reading CEL files into AffyBatch:", e$message, "\nConsider checking your CEL files or specifying a CDF environment.")) }) cat("Successfully loaded raw Affymetrix data.\n") # --- Quantify probesets using the PLIER algorithm (just.plier function) --- # The 'just.plier' function from the 'affy' package implements the PLIER algorithm. # It performs background correction, normalization, and summarization. cat("Performing probeset quantification using the PLIER algorithm...\n") eset_plier <- just.plier(raw_data) # --- Extract expression matrix --- expression_matrix <- exprs(eset_plier) # --- Write results to a TSV file --- cat("Writing quantified probesets to:", output_file, "\n") write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) cat("Probeset quantification using PLIER completed successfully.\n") EOF # Execute the R script Rscript run_plier_quantification.R # Clean up the R script (optional) # rm run_plier_quantification.R -
3
As previously described (Huelga et al., 2012).
$ Bash example
# The specific tool, version, and command cannot be inferred from the generic description 'As previously described (Huelga et al., 2012)' without further context about the assay or step type.
-
4
HJAY_r2.pgf
HJAY_r2.pgf (Inferred with models/gemini-2.5-flash) vr2$ Bash example
bash # This command is a generic placeholder as the step description "HJAY_r2.pgf" # does not provide sufficient information to infer specific parameters or # the exact nature of the script. The file extension '.pgf' typically refers # to Portable Graphics Format, suggesting this might be a plotting script or # an internal identifier rather than a standard bioinformatics tool. # Replace 'input_data.txt' and 'output_result.txt' with actual file names. ./HJAY_r2.pgf --input input_data.txt --output output_result.txt
Tools Used
Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. As previously described (Huelga et al., 2012). HJAY_r2.pgf