GSE72408 Processing Pipeline — Yeo Lab Publications

Publication

The long noncoding RNA Malat1 regulates CD8+ T cell differentiation by mediating epigenetic repression.

The Journal of experimental medicine (2022) — PMID 35593887

Dataset

The transcription factors ZEB2 and T-bet cooperate to program cytotoxic T cell terminal differentiation

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Utilized R::beadarray package with the readIdatFiles and normaliseIllumina functions to extract raw and normalised (neqc, log2 transformed) values.

R vNot specified GitHub

$ Bash example

# Install R and Bioconductor (if not already installed)
# sudo apt update
# sudo apt install -y r-base
# R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager", repos = "https://cloud.r-project.org"); BiocManager::install("beadarray", update = FALSE, ask = FALSE)'

# Create a directory for input IDAT files and a dummy sample sheet for demonstration
mkdir -p input_idat_files
# NOTE: Replace with actual IDAT files and SampleSheet.csv
# For a real run, you would place your .idat files in input_idat_files/
# and your SampleSheet.csv in the working directory or specified path.
# Example dummy SampleSheet.csv (adjust columns as per your actual data)
cat <<EOF > input_sample_sheet.csv
[Header]
Investigator Name,John Doe
Project Name,MyProject
Experiment Name,IlluminaArrayExperiment
Date,2023-10-27
[Data]
Sample_ID,Array_ID,Sentrix_ID,Sentrix_Position,Sample_Group
Sample1,1,200000000001,R01C01,Control
Sample2,2,200000000002,R01C02,Treatment
EOF

# Create the R script
cat << 'EOF' > process_illumina.R
# Load the beadarray package
library(beadarray)

# Define input/output paths using environment variables for flexibility
idat_files_dir <- Sys.getenv("IDAT_FILES_DIR", "input_idat_files")
sample_sheet_path <- Sys.getenv("SAMPLE_SHEET_PATH", "input_sample_sheet.csv")

output_raw_file <- Sys.getenv("OUTPUT_RAW_FILE", "raw_expression_values.csv")
output_normalized_file <- Sys.getenv("OUTPUT_NORMALIZED_FILE", "normalized_expression_values.csv")

# Check if input directory and sample sheet exist
if (!dir.exists(idat_files_dir)) {
  stop(paste("Input IDAT files directory not found:", idat_files_dir))
}
if (!file.exists(sample_sheet_path)) {
  stop(paste("Sample sheet not found:", sample_sheet_path))
}

message(paste("Reading IDAT files from:", idat_files_dir))
message(paste("Using sample sheet:", sample_sheet_path))

# Read raw data from IDAT files
# This function returns an 'illuminaChannelList' object
raw_data_obj <- readIdatFiles(path = idat_files_dir, sampleSheet = sample_sheet_path)

# Extract and save raw expression values
# For Illumina arrays, raw values are typically the intensities from the green (Grn) or red (Red) channel.
# We'll extract the green channel intensities as a representative "raw value" matrix.
# If the array is two-color, one might save both or a combined signal.
message("Extracting and saving raw (green channel) expression values...")
raw_expression_matrix <- getBeadData(raw_data_obj, what = "Grn")
write.csv(raw_expression_matrix, file = output_raw_file, row.names = TRUE)
message(paste("Raw expression values saved to:", output_raw_file))

# Normalise data using neqc method
# The neqc method inherently performs background correction and log2 transformation.
message("Normalizing data using neqc method (log2 transformed)...")
normalized_data_obj <- normaliseIllumina(raw_data_obj, method = "neqc")

# Extract normalized expression matrix
normalized_expression_matrix <- exprs(normalized_data_obj)

# Save normalized expression values
write.csv(normalized_expression_matrix, file = output_normalized_file, row.names = TRUE)
message(paste("Normalized (neqc, log2) expression values saved to:", output_normalized_file))
EOF

# Set environment variables for input/output paths (optional, defaults are used if not set)
# export IDAT_FILES_DIR="path/to/your/idat_files"
# export SAMPLE_SHEET_PATH="path/to/your/sample_sheet.csv"
# export OUTPUT_RAW_FILE="my_raw_expression.csv"
# export OUTPUT_NORMALIZED_FILE="my_normalized_expression.csv"

# Execute the R script
Rscript process_illumina.R

View on GitHub

Tools Used

R