GSE86038 Processing Pipeline
GSE
code_examples
3 steps
Publication
Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System.Neuron (2016) — PMID 27773581
Dataset
GSE86038HNRNPA2B1 regulates alternative RNA processing in the nervous system and accumulates in granules in ALS IPSC-derived motor neurons [hnRNPA2B1_Arrays_…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize.
Microarray vNot specified$ Bash example
# Install Affymetrix Power Tools (APT) # APT is typically downloaded as a binary package from the Thermo Fisher Scientific website. # Example installation (may vary based on OS and APT version): # wget https://assets.thermofisher.com/TFS-Assets/LSG/software/APT_2.x.x_Linux.zip # unzip APT_2.x.x_Linux.zip # export PATH=$PATH:/path/to/APT_binaries # Placeholder variables # Replace with actual CEL files. For multiple files, list them space-separated. # Example: INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" # Alternatively, for a long list, create a file (e.g., cel_list.txt) with one CEL file path per line # and use --cel-files-file cel_list.txt instead of --cel-files. INPUT_CEL_FILES="sample1.CEL sample2.CEL sample3.CEL" # Replace with the appropriate library file (e.g., CDF, PGF) for your array type PROBESET_LIBRARY_FILE="HG-U133_Plus_2.cdf" OUTPUT_DIR="apt_summarize_output" # Common algorithms: rma, mas5, plier, etc. Choose based on experimental design. ALGORITHM="rma" mkdir -p "${OUTPUT_DIR}" apt-probeset-summarize \ --cel-files ${INPUT_CEL_FILES} \ --cdf-file "${PROBESET_LIBRARY_FILE}" \ --output-dir "${OUTPUT_DIR}" \ --analysis-algorithm "${ALGORITHM}" \ --log-file "${OUTPUT_DIR}/apt_summarize.log" -
2
Iter-plier algorithm used to quantify probesets.
$ Bash example
# Install R and Bioconductor if not already present # For example, using conda: # conda create -n r_env r-base bioconductor-affy bioconductor-plier # conda activate r_env # Create an R script to perform Iter-PLIER quantification cat << 'EOF' > quantify_probesets.R # Load necessary Bioconductor packages library(affy) library(plier) # Define input and output paths # Assuming CEL files are in a directory named 'cel_files' cel_dir <- "cel_files" output_file <- "probeset_quantification_iterplier.tsv" # Get list of CEL files cel_files <- list.files(path = cel_dir, pattern = "\\.CEL$", full.names = TRUE, ignore.case = TRUE) if (length(cel_files) == 0) { stop("No CEL files found in the specified directory: ", cel_dir) } # Read CEL files into an AffyBatch object # This step automatically handles the appropriate CDF (Chip Description File) # for the array type if available in Bioconductor annotation packages. raw_data <- ReadAffy(filenames = cel_files) # Perform PLIER quantification, which is an iterative algorithm (Iter-PLIER). # The 'plier' function from the 'plier' package implements this algorithm. es_plier <- plier(raw_data) # Extract expression matrix expression_matrix <- exprs(es_plier) # Write results to a TSV file write.table(expression_matrix, file = output_file, sep = "\t", quote = FALSE, row.names = TRUE) message(paste("Probeset quantification completed. Results saved to:", output_file)) EOF # Execute the R script Rscript quantify_probesets.R # Example directory structure for input CEL files: # mkdir -p cel_files # touch cel_files/sample1.CEL # touch cel_files/sample2.CEL -
3
http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf
clipper (Inferred with models/gemini-2.5-flash) vlatest (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install clipper (if not already installed) # git clone https://github.com/yeolab/clipper.git # cd clipper # pip install . # Placeholder for genome reference files (e.g., hg38) # mkdir -p reference_data # wget -P reference_data http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz # gunzip reference_data/hg38.fa.gz # samtools faidx reference_data/hg38.fa # Define input files (placeholders) IP_BAM="ip_sample.bam" CONTROL_BAM="control_sample.bam" GENOME_FASTA="reference_data/hg38.fa" OUTPUT_PREFIX="eclip_peaks" # Run clipper for peak calling python clipper.py -b "${IP_BAM}" -c "${CONTROL_BAM}" -g "${GENOME_FASTA}" -o "${OUTPUT_PREFIX}"
Tools Used
Raw Source Text
Data processed using Affymetrix package (Affy Power Tools) apt-probeset-summarize. Iter-plier algorithm used to quantify probesets. http://exon.ucsc.edu/documentation/mjay_library/hjay.pgf