GSE74250 Processing Pipeline

GSE code_examples 4 steps

Publication

RNA-binding protein CPEB1 remodels host and viral RNA landscapes.

Nature structural & molecular biology (2016) — PMID 27775709

Dataset

GSE74250

Transcriptome analysis of diverse cell types infected with human cytomegalovirus

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.

    Microarray vInferred with models/gemini-2.5-flash
    $ Bash example
    # Install Affymetrix Power Tools (APT) - specific installation steps vary by OS and version.
    # Please refer to the official Thermo Fisher Scientific documentation for the most up-to-date installation instructions.
    # Example for Linux (check official documentation for latest instructions):
    # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux.zip
    # unzip APT_2.11.2_Linux.zip
    # cd APT_2.11.2_Linux
    # ./install.sh
    
    # Ensure APT executables are in your PATH
    # export PATH="/path/to/APT/bin:$PATH"
    
    # Placeholder for input CEL files. Create a file named 'input_celfiles.txt'
    # containing the paths to your .CEL files, one path per line.
    # Example: 
    # echo "/path/to/sample1.CEL" > input_celfiles.txt
    # echo "/path/to/sample2.CEL" >> input_celfiles.txt
    
    # Placeholder for a CDF (Chip Description File) or PGF (Probe Group File).
    # You must download the appropriate CDF/PGF file for your specific Affymetrix array type
    # from the Thermo Fisher Scientific website. For example, 'human_hg19.cdf' is a placeholder
    # for a human array based on the hg19 genome assembly.
    # Example download (replace with actual file for your array):
    # wget https://www.thermofisher.com/content/dam/LifeTech/Documents/PDFs/HuGene-1_0-st-v1.cdf -O human_hg19.cdf
    
    # Create an output directory for the summarized data
    mkdir -p apt_summarize_output
    
    # Run apt-probeset-summarize using the RMA (Robust Multi-array Average) algorithm.
    # -a rma: Specifies the RMA algorithm for background correction, normalization, and summarization.
    # --cel-files: Specifies a file containing a list of CEL file paths, one per line.
    # --cdf-file: Specifies the CDF or PGF file corresponding to the array type used.
    # -o apt_summarize_output: Specifies the output directory where summarized data will be stored.
    apt-probeset-summarize -a rma \
      --cel-files input_celfiles.txt \
      --cdf-file human_hg19.cdf \
      -o apt_summarize_output
  2. 2

    Iter-plier algorithm used to quantify probesets.

    iterpliertool (Inferred with models/gemini-2.5-flash) vAPT 1.18.0 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install Affymetrix Power Tools (APT)
    # conda install -c bioconda affymetrix-power-tools
    
    # Define input CEL files (replace with actual file paths)
    CEL_FILES="sample1.CEL sample2.CEL sample3.CEL"
    
    # Define the Chip Description File (CDF) for the specific array type (replace with actual CDF path)
    # Example for a common array: HG-U133_Plus_2.cdf
    CDF_FILE="path/to/your/array_type.cdf"
    
    # Define the output file name for the probeset quantification
    OUTPUT_FILE="probeset_quantification.txt"
    
    # Execute iterpliertool to quantify probesets using the Iter-PLIER algorithm
    # The --cel-files argument can take multiple CEL files separated by spaces
    # The --cdf-file argument specifies the CDF file for probe set definitions
    # The --output-file argument specifies the output file for the summarization results
    iterpliertool --cel-files "${CEL_FILES}" --cdf-file "${CDF_FILE}" --output-file "${OUTPUT_FILE}"
  3. 3

    As previously described (Huelga et al., 2012).

    Not specified (Inferred with models/gemini-2.5-flash) vNot specified
    $ Bash example
    # No specific command or parameters can be inferred from 'As previously described (Huelga et al., 2012)'.
    # This description refers to a methodology detailed in the cited publication, not a specific software tool or command.
  4. 4

    HJAY_r2.pgf

    Custom Script (Inferred with models/gemini-2.5-flash) vN/A
    $ Bash example
    # This command is a placeholder for the custom script "HJAY_r2.pgf".
    # The specific tool, version, and parameters are not provided in the description.
    # Replace 'input.bam', 'output.tsv', and 'hg38' with actual file paths and reference genome.
    # Reference genome 'hg38' is used as a common placeholder for the latest human assembly.
    
    # Example execution of a custom script.
    # Assuming HJAY_r2.pgf is an executable script or needs an interpreter like bash/python.
    # For demonstration, we'll assume it's a bash script.
    bash HJAY_r2.pgf \
        --input_file "input.bam" \
        --output_file "output.tsv" \
        --genome_assembly "hg38"

Tools Used

Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. As previously described (Huelga et al., 2012).
HJAY_r2.pgf
← Back to Analysis