GSE74250 Processing Pipeline — Yeo Lab Publications

Publication

RNA-binding protein CPEB1 remodels host and viral RNA landscapes.

Nature structural & molecular biology (2016) — PMID 27775709

Dataset

Transcriptome analysis of diverse cell types infected with human cytomegalovirus

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

Microarray vInferred with models/gemini-2.5-flash

$ Bash example

# Install Affymetrix Power Tools (APT) - specific installation steps vary by OS and version.
# Please refer to the official Thermo Fisher Scientific documentation for the most up-to-date installation instructions.
# Example for Linux (check official documentation for latest instructions):
# wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux.zip
# unzip APT_2.11.2_Linux.zip
# cd APT_2.11.2_Linux
# ./install.sh

# Ensure APT executables are in your PATH
# export PATH="/path/to/APT/bin:$PATH"

# Placeholder for input CEL files. Create a file named 'input_celfiles.txt'
# containing the paths to your .CEL files, one path per line.
# Example:
# echo "/path/to/sample1.CEL" > input_celfiles.txt
# echo "/path/to/sample2.CEL" >> input_celfiles.txt

# Placeholder for a CDF (Chip Description File) or PGF (Probe Group File).
# You must download the appropriate CDF/PGF file for your specific Affymetrix array type
# from the Thermo Fisher Scientific website. For example, 'human_hg19.cdf' is a placeholder
# for a human array based on the hg19 genome assembly.
# Example download (replace with actual file for your array):
# wget https://www.thermofisher.com/content/dam/LifeTech/Documents/PDFs/HuGene-1_0-st-v1.cdf -O human_hg19.cdf

# Create an output directory for the summarized data
mkdir -p apt_summarize_output

# Run apt-probeset-summarize using the RMA (Robust Multi-array Average) algorithm.
# -a rma: Specifies the RMA algorithm for background correction, normalization, and summarization.
# --cel-files: Specifies a file containing a list of CEL file paths, one per line.
# --cdf-file: Specifies the CDF or PGF file corresponding to the array type used.
# -o apt_summarize_output: Specifies the output directory where summarized data will be stored.
apt-probeset-summarize -a rma \
--cel-files input_celfiles.txt \
--cdf-file human_hg19.cdf \
-o apt_summarize_output

2

Iter-plier algorithm used to quantify probesets.

iterpliertool (Inferred with models/gemini-2.5-flash) vAPT 1.18.0 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install Affymetrix Power Tools (APT)
# conda install -c bioconda affymetrix-power-tools

# Define input CEL files (replace with actual file paths)
CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"

# Define the Chip Description File (CDF) for the specific array type (replace with actual CDF path)
# Example for a common array: HG-U133_Plus_2.cdf
CDF_FILE="path/to/your/array_type.cdf"

# Define the output file name for the probeset quantification
OUTPUT_FILE="probeset_quantification.txt"

# Execute iterpliertool to quantify probesets using the Iter-PLIER algorithm
# The --cel-files argument can take multiple CEL files separated by spaces
# The --cdf-file argument specifies the CDF file for probe set definitions
# The --output-file argument specifies the output file for the summarization results
iterpliertool --cel-files "${CEL_FILES}" --cdf-file "${CDF_FILE}" --output-file "${OUTPUT_FILE}"

3

As previously described (Huelga et al., 2012).

Not specified (Inferred with models/gemini-2.5-flash) vNot specified

$ Bash example

# No specific command or parameters can be inferred from 'As previously described (Huelga et al., 2012)'.
# This description refers to a methodology detailed in the cited publication, not a specific software tool or command.

4

HJAY_r2.pgf

Custom Script (Inferred with models/gemini-2.5-flash) vN/A

$ Bash example

# This command is a placeholder for the custom script "HJAY_r2.pgf".
# The specific tool, version, and parameters are not provided in the description.
# Replace 'input.bam', 'output.tsv', and 'hg38' with actual file paths and reference genome.
# Reference genome 'hg38' is used as a common placeholder for the latest human assembly.

# Example execution of a custom script.
# Assuming HJAY_r2.pgf is an executable script or needs an interpreter like bash/python.
# For demonstration, we'll assume it's a bash script.
bash HJAY_r2.pgf \
    --input_file "input.bam" \
    --output_file "output.tsv" \
    --genome_assembly "hg38"

Tools Used

Microarray