GSE72408 Processing Pipeline
GSE
code_examples
1 step
Publication
The long noncoding RNA Malat1 regulates CD8+ T cell differentiation by mediating epigenetic repression.The Journal of experimental medicine (2022) — PMID 35593887
Dataset
GSE72408The transcription factors ZEB2 and T-bet cooperate to program cytotoxic T cell terminal differentiation
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Utilized R::beadarray package with the readIdatFiles and normaliseIllumina functions to extract raw and normalised (neqc, log2 transformed) values.
$ Bash example
# Install R and Bioconductor (if not already installed) # sudo apt update # sudo apt install -y r-base # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager", repos = "https://cloud.r-project.org"); BiocManager::install("beadarray", update = FALSE, ask = FALSE)' # Create a directory for input IDAT files and a dummy sample sheet for demonstration mkdir -p input_idat_files # NOTE: Replace with actual IDAT files and SampleSheet.csv # For a real run, you would place your .idat files in input_idat_files/ # and your SampleSheet.csv in the working directory or specified path. # Example dummy SampleSheet.csv (adjust columns as per your actual data) cat <<EOF > input_sample_sheet.csv [Header] Investigator Name,John Doe Project Name,MyProject Experiment Name,IlluminaArrayExperiment Date,2023-10-27 [Data] Sample_ID,Array_ID,Sentrix_ID,Sentrix_Position,Sample_Group Sample1,1,200000000001,R01C01,Control Sample2,2,200000000002,R01C02,Treatment EOF # Create the R script cat << 'EOF' > process_illumina.R # Load the beadarray package library(beadarray) # Define input/output paths using environment variables for flexibility idat_files_dir <- Sys.getenv("IDAT_FILES_DIR", "input_idat_files") sample_sheet_path <- Sys.getenv("SAMPLE_SHEET_PATH", "input_sample_sheet.csv") output_raw_file <- Sys.getenv("OUTPUT_RAW_FILE", "raw_expression_values.csv") output_normalized_file <- Sys.getenv("OUTPUT_NORMALIZED_FILE", "normalized_expression_values.csv") # Check if input directory and sample sheet exist if (!dir.exists(idat_files_dir)) { stop(paste("Input IDAT files directory not found:", idat_files_dir)) } if (!file.exists(sample_sheet_path)) { stop(paste("Sample sheet not found:", sample_sheet_path)) } message(paste("Reading IDAT files from:", idat_files_dir)) message(paste("Using sample sheet:", sample_sheet_path)) # Read raw data from IDAT files # This function returns an 'illuminaChannelList' object raw_data_obj <- readIdatFiles(path = idat_files_dir, sampleSheet = sample_sheet_path) # Extract and save raw expression values # For Illumina arrays, raw values are typically the intensities from the green (Grn) or red (Red) channel. # We'll extract the green channel intensities as a representative "raw value" matrix. # If the array is two-color, one might save both or a combined signal. message("Extracting and saving raw (green channel) expression values...") raw_expression_matrix <- getBeadData(raw_data_obj, what = "Grn") write.csv(raw_expression_matrix, file = output_raw_file, row.names = TRUE) message(paste("Raw expression values saved to:", output_raw_file)) # Normalise data using neqc method # The neqc method inherently performs background correction and log2 transformation. message("Normalizing data using neqc method (log2 transformed)...") normalized_data_obj <- normaliseIllumina(raw_data_obj, method = "neqc") # Extract normalized expression matrix normalized_expression_matrix <- exprs(normalized_data_obj) # Save normalized expression values write.csv(normalized_expression_matrix, file = output_normalized_file, row.names = TRUE) message(paste("Normalized (neqc, log2) expression values saved to:", output_normalized_file)) EOF # Set environment variables for input/output paths (optional, defaults are used if not set) # export IDAT_FILES_DIR="path/to/your/idat_files" # export SAMPLE_SHEET_PATH="path/to/your/sample_sheet.csv" # export OUTPUT_RAW_FILE="my_raw_expression.csv" # export OUTPUT_NORMALIZED_FILE="my_normalized_expression.csv" # Execute the R script Rscript process_illumina.R
Tools Used
Raw Source Text
Utilized R::beadarray package with the readIdatFiles and normaliseIllumina functions to extract raw and normalised (neqc, log2 transformed) values.