GSE16681 Processing Pipeline — Yeo Lab Publications

Publication

A distinct microRNA signature for definitive endoderm derived from human embryonic stem cells.

Stem cells and development (2010) — PMID 19807270

Dataset

mRNA expression data from differentiation of human ESCs into definitive endoderm, Cyt49 on matrigel

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

The data were normalised using quantile normalisation with IlluminaGUI in R

R vNot specified GitHub

$ Bash example

# Install R and limma if not already present
# conda install -c conda-forge r-base
# conda install -c bioconda bioconductor-limma

# Create a dummy R script for quantile normalization
# This script assumes 'input_data.tsv' contains the data matrix
# and will output 'normalized_data.tsv'
cat << 'EOF' > normalize_data.R
# Load necessary library
library(limma)

# --- Configuration ---
input_file <- "input_data.tsv" # Placeholder for input data file
output_file <- "normalized_data.tsv" # Placeholder for output data file

# --- Load Data ---
# Assuming input_data.tsv is a tab-separated file with header
# and the first column is gene/feature IDs, and subsequent columns are samples
# Adjust read.delim parameters based on actual file format
data_matrix <- as.matrix(read.delim(input_file, row.names = 1, sep = "\t", header = TRUE))

# --- Perform Quantile Normalization ---
# The 'method="quantile"' argument specifies quantile normalization
normalized_matrix <- normalizeBetweenArrays(data_matrix, method = "quantile")

# --- Save Normalized Data ---
# Write the normalized matrix to a new tab-separated file
write.table(normalized_matrix, file = output_file, sep = "\t", quote = FALSE, col.names = NA)

message(paste("Quantile normalization complete. Normalized data saved to:", output_file))
EOF

# Create a dummy input file for demonstration
echo -e "Gene\tSample1\tSample2\tSample3" > input_data.tsv
echo -e "GeneA\t100\t120\t90" >> input_data.tsv
echo -e "GeneB\t50\t60\t45" >> input_data.tsv
echo -e "GeneC\t200\t210\t180" >> input_data.tsv
echo -e "GeneD\t75\t80\t70" >> input_data.tsv

# Execute the R script
Rscript normalize_data.R

View on GitHub

Tools Used

R