GSE74250 Processing Pipeline
GSE
code_examples
4 steps
Publication
RNA-binding protein CPEB1 remodels host and viral RNA landscapes.Nature structural & molecular biology (2016) — PMID 27775709
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.
Microarray vInferred with models/gemini-2.5-flash$ Bash example
# Install Affymetrix Power Tools (APT) - specific installation steps vary by OS and version. # Please refer to the official Thermo Fisher Scientific documentation for the most up-to-date installation instructions. # Example for Linux (check official documentation for latest instructions): # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.11.2_Linux.zip # unzip APT_2.11.2_Linux.zip # cd APT_2.11.2_Linux # ./install.sh # Ensure APT executables are in your PATH # export PATH="/path/to/APT/bin:$PATH" # Placeholder for input CEL files. Create a file named 'input_celfiles.txt' # containing the paths to your .CEL files, one path per line. # Example: # echo "/path/to/sample1.CEL" > input_celfiles.txt # echo "/path/to/sample2.CEL" >> input_celfiles.txt # Placeholder for a CDF (Chip Description File) or PGF (Probe Group File). # You must download the appropriate CDF/PGF file for your specific Affymetrix array type # from the Thermo Fisher Scientific website. For example, 'human_hg19.cdf' is a placeholder # for a human array based on the hg19 genome assembly. # Example download (replace with actual file for your array): # wget https://www.thermofisher.com/content/dam/LifeTech/Documents/PDFs/HuGene-1_0-st-v1.cdf -O human_hg19.cdf # Create an output directory for the summarized data mkdir -p apt_summarize_output # Run apt-probeset-summarize using the RMA (Robust Multi-array Average) algorithm. # -a rma: Specifies the RMA algorithm for background correction, normalization, and summarization. # --cel-files: Specifies a file containing a list of CEL file paths, one per line. # --cdf-file: Specifies the CDF or PGF file corresponding to the array type used. # -o apt_summarize_output: Specifies the output directory where summarized data will be stored. apt-probeset-summarize -a rma \ --cel-files input_celfiles.txt \ --cdf-file human_hg19.cdf \ -o apt_summarize_output
-
2
Iter-plier algorithm used to quantify probesets.
iterpliertool (Inferred with models/gemini-2.5-flash) vAPT 1.18.0 (Inferred with models/gemini-2.5-flash)$ Bash example
# Install Affymetrix Power Tools (APT) # conda install -c bioconda affymetrix-power-tools # Define input CEL files (replace with actual file paths) CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" # Define the Chip Description File (CDF) for the specific array type (replace with actual CDF path) # Example for a common array: HG-U133_Plus_2.cdf CDF_FILE="path/to/your/array_type.cdf" # Define the output file name for the probeset quantification OUTPUT_FILE="probeset_quantification.txt" # Execute iterpliertool to quantify probesets using the Iter-PLIER algorithm # The --cel-files argument can take multiple CEL files separated by spaces # The --cdf-file argument specifies the CDF file for probe set definitions # The --output-file argument specifies the output file for the summarization results iterpliertool --cel-files "${CEL_FILES}" --cdf-file "${CDF_FILE}" --output-file "${OUTPUT_FILE}" -
3
As previously described (Huelga et al., 2012).
Not specified (Inferred with models/gemini-2.5-flash) vNot specified$ Bash example
# No specific command or parameters can be inferred from 'As previously described (Huelga et al., 2012)'. # This description refers to a methodology detailed in the cited publication, not a specific software tool or command.
-
4
HJAY_r2.pgf
Custom Script (Inferred with models/gemini-2.5-flash) vN/A$ Bash example
# This command is a placeholder for the custom script "HJAY_r2.pgf". # The specific tool, version, and parameters are not provided in the description. # Replace 'input.bam', 'output.tsv', and 'hg38' with actual file paths and reference genome. # Reference genome 'hg38' is used as a common placeholder for the latest human assembly. # Example execution of a custom script. # Assuming HJAY_r2.pgf is an executable script or needs an interpreter like bash/python. # For demonstration, we'll assume it's a bash script. bash HJAY_r2.pgf \ --input_file "input.bam" \ --output_file "output.tsv" \ --genome_assembly "hg38"
Tools Used
Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. As previously described (Huelga et al., 2012). HJAY_r2.pgf