GSE72408 Processing Pipeline

GSE code_examples 1 step

Publication

The long noncoding RNA Malat1 regulates CD8+ T cell differentiation by mediating epigenetic repression.

The Journal of experimental medicine (2022) — PMID 35593887

Dataset

GSE72408

The transcription factors ZEB2 and T-bet cooperate to program cytotoxic T cell terminal differentiation

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Utilized R::beadarray package with the readIdatFiles and normaliseIllumina functions to extract raw and normalised (neqc, log2 transformed) values.

    R vNot specified GitHub
    $ Bash example
    # Install R and Bioconductor (if not already installed)
    # sudo apt update
    # sudo apt install -y r-base
    # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager", repos = "https://cloud.r-project.org"); BiocManager::install("beadarray", update = FALSE, ask = FALSE)'
    
    # Create a directory for input IDAT files and a dummy sample sheet for demonstration
    mkdir -p input_idat_files
    # NOTE: Replace with actual IDAT files and SampleSheet.csv
    # For a real run, you would place your .idat files in input_idat_files/
    # and your SampleSheet.csv in the working directory or specified path.
    # Example dummy SampleSheet.csv (adjust columns as per your actual data)
    cat <<EOF > input_sample_sheet.csv
    [Header]
    Investigator Name,John Doe
    Project Name,MyProject
    Experiment Name,IlluminaArrayExperiment
    Date,2023-10-27
    [Data]
    Sample_ID,Array_ID,Sentrix_ID,Sentrix_Position,Sample_Group
    Sample1,1,200000000001,R01C01,Control
    Sample2,2,200000000002,R01C02,Treatment
    EOF
    
    # Create the R script
    cat << 'EOF' > process_illumina.R
    # Load the beadarray package
    library(beadarray)
    
    # Define input/output paths using environment variables for flexibility
    idat_files_dir <- Sys.getenv("IDAT_FILES_DIR", "input_idat_files")
    sample_sheet_path <- Sys.getenv("SAMPLE_SHEET_PATH", "input_sample_sheet.csv")
    
    output_raw_file <- Sys.getenv("OUTPUT_RAW_FILE", "raw_expression_values.csv")
    output_normalized_file <- Sys.getenv("OUTPUT_NORMALIZED_FILE", "normalized_expression_values.csv")
    
    # Check if input directory and sample sheet exist
    if (!dir.exists(idat_files_dir)) {
      stop(paste("Input IDAT files directory not found:", idat_files_dir))
    }
    if (!file.exists(sample_sheet_path)) {
      stop(paste("Sample sheet not found:", sample_sheet_path))
    }
    
    message(paste("Reading IDAT files from:", idat_files_dir))
    message(paste("Using sample sheet:", sample_sheet_path))
    
    # Read raw data from IDAT files
    # This function returns an 'illuminaChannelList' object
    raw_data_obj <- readIdatFiles(path = idat_files_dir, sampleSheet = sample_sheet_path)
    
    # Extract and save raw expression values
    # For Illumina arrays, raw values are typically the intensities from the green (Grn) or red (Red) channel.
    # We'll extract the green channel intensities as a representative "raw value" matrix.
    # If the array is two-color, one might save both or a combined signal.
    message("Extracting and saving raw (green channel) expression values...")
    raw_expression_matrix <- getBeadData(raw_data_obj, what = "Grn")
    write.csv(raw_expression_matrix, file = output_raw_file, row.names = TRUE)
    message(paste("Raw expression values saved to:", output_raw_file))
    
    # Normalise data using neqc method
    # The neqc method inherently performs background correction and log2 transformation.
    message("Normalizing data using neqc method (log2 transformed)...")
    normalized_data_obj <- normaliseIllumina(raw_data_obj, method = "neqc")
    
    # Extract normalized expression matrix
    normalized_expression_matrix <- exprs(normalized_data_obj)
    
    # Save normalized expression values
    write.csv(normalized_expression_matrix, file = output_normalized_file, row.names = TRUE)
    message(paste("Normalized (neqc, log2) expression values saved to:", output_normalized_file))
    EOF
    
    # Set environment variables for input/output paths (optional, defaults are used if not set)
    # export IDAT_FILES_DIR="path/to/your/idat_files"
    # export SAMPLE_SHEET_PATH="path/to/your/sample_sheet.csv"
    # export OUTPUT_RAW_FILE="my_raw_expression.csv"
    # export OUTPUT_NORMALIZED_FILE="my_normalized_expression.csv"
    
    # Execute the R script
    Rscript process_illumina.R

Tools Used

Raw Source Text
Utilized R::beadarray package with the readIdatFiles and normaliseIllumina functions to extract raw and normalised (neqc, log2 transformed) values.
← Back to Analysis