GSE184564 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

RNA binding protein DDX5 directs tuft cell specification and function to regulate microbial repertoire and disease susceptibility in the intestine.

Gut (2022) — PMID 34853057

Dataset

GSE184564

Spatial transcriptomic studies of the small intestine from WT and DDX5△IEC (KO) mice

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Space v3.1.0 to align reads to mouse reference genome (mm10)

    STAR (Inferred with models/gemini-2.5-flash) vNot specified GitHub
    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables
    # Assuming paired-end reads, adjust if single-end
    READS_R1="sample_R1.fastq.gz"
    READS_R2="sample_R2.fastq.gz"
    GENOME_DIR="/path/to/STAR_index/mm10" # Placeholder for mouse (mm10) STAR index
    OUTPUT_PREFIX="sample_aligned_"
    NUM_THREADS=8 # Adjust based on available CPU cores
    RAM_LIMIT_BAM_SORT=60000000000 # 60GB, adjust based on available RAM
    
    # Note: The STAR genome index for mm10 must be pre-built. 
    # Example command to build STAR index (run once per genome):
    # STAR --runThreadN "${NUM_THREADS}" \
    #      --runMode genomeGenerate \
    #      --genomeDir "${GENOME_DIR}" \
    #      --genomeFastaFiles /path/to/mm10.fa \
    #      --sjdbGTFfile /path/to/mm10.gtf \
    #      --sjdbOverhang 100 # Adjust based on read length
    
    # Align reads to the mouse (mm10) reference genome using STAR
    STAR --genomeDir "${GENOME_DIR}" \
         --readFilesIn "${READS_R1}" "${READS_R2}" \
         --readFilesCommand zcat \
         --runThreadN "${NUM_THREADS}" \
         --outFileNamePrefix "${OUTPUT_PREFIX}" \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes All \
         --outFilterMultimapNmax 1 \
         --outFilterMismatchNmax 3 \
         --alignIntronMax 1 \
         --alignEndsType Local \
         --outSAMunmapped Within \
         --outSAMstrandField intronMotif \
         --outFilterScoreMinOverLread 0.66 \
         --outFilterMatchNminOverLread 0.66 \
         --limitBAMsortRAM "${RAM_LIMIT_BAM_SORT}"
    
  2. 2

    Used Loupe Browser to filter Atoh1, Sox4, Lgr5 high progenitor-containing spots and Dclk1, Klf4, Lyz1, Chga high mature secretory lineage-containing spots (Log2 expression > 0.5)

    Loupe Browser v(Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Loupe Browser is primarily a GUI tool for interactive visualization and analysis.
    # The filtering described is typically performed interactively within the Loupe Browser application.
    # Below is a conceptual Python script that would achieve similar filtering programmatically
    # if the expression data were available in a tabular format (e.g., CSV).
    
    # Create a dummy expression_matrix.csv for demonstration purposes
    # In a real scenario, this file would be generated by upstream steps (e.g., Space Ranger output)
    echo "Spot_ID,Atoh1,Sox4,Lgr5,Dclk1,Klf4,Lyz1,Chga,OtherGene" > expression_matrix.csv
    echo "Spot1,0.1,0.2,0.6,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Progenitor
    echo "Spot2,0.7,0.1,0.2,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Progenitor
    echo "Spot3,0.1,0.1,0.1,0.6,0.2,0.1,0.1,1.0" >> expression_matrix.csv # Mature Secretory
    echo "Spot4,0.1,0.1,0.1,0.1,0.7,0.1,0.1,1.0" >> expression_matrix.csv # Mature Secretory
    echo "Spot5,0.1,0.1,0.1,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Neither
    echo "Spot6,0.8,0.7,0.6,0.9,0.8,0.7,0.6,1.0" >> expression_matrix.csv # Both (will appear in both outputs)
    
    # Ensure Python and pandas are installed
    # conda install pandas numpy
    
    # Execute the Python script for filtering
    python -c '
    import pandas as pd
    import numpy as np
    
    input_matrix = "expression_matrix.csv"
    output_progenitor = "filtered_progenitor_spots.csv"
    output_mature_secretory = "filtered_mature_secretory_spots.csv"
    
    progenitor_genes = ["Atoh1", "Sox4", "Lgr5"]
    mature_secretory_genes = ["Dclk1", "Klf4", "Lyz1", "Chga"]
    log2_threshold = 0.5
    
    try:
        df = pd.read_csv(input_matrix, index_col=0)
    
        progenitor_filter = df[progenitor_genes].gt(log2_threshold).any(axis=1)
        filtered_progenitor_spots = df[progenitor_filter]
    
        mature_secretory_filter = df[mature_secretory_genes].gt(log2_threshold).any(axis=1)
        filtered_mature_secretory_spots = df[mature_secretory_filter]
    
        filtered_progenitor_spots.to_csv(output_progenitor)
        filtered_mature_secretory_spots.to_csv(output_mature_secretory)
    
        print(f"Filtered progenitor spots saved to {output_progenitor}")
        print(f"Filtered mature secretory spots saved to {output_mature_secretory}")
    
    except FileNotFoundError:
        print(f"Error: Input file \\'{input_matrix}\\' not found. Please ensure the file exists.")
    except KeyError as e:
        print(f"Error: One or more specified genes not found in the expression matrix columns: {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
    '
    
    # Clean up dummy file (optional, for a real pipeline this would be an actual input)
    # rm expression_matrix.csv
  3. 3

    Differential gene expression analysis

    DESeq2 (Inferred with models/gemini-2.5-flash) v1.36.0 (R 4.2.0) GitHub
    $ Bash example
    # This script performs differential gene expression analysis using DESeq2.
    # It assumes you have a gene count matrix (e.g., from featureCounts, Salmon, Kallisto, RSEM)
    # and a sample metadata file with experimental design information.
    
    # --- Installation (uncomment and run if DESeq2 is not installed) ---
    # # Install R and Bioconductor if not already present
    # # For example, using conda:
    # # conda create -n deseq2_env r-base=4.2.0 bioconductor-deseq2=1.36.0 -c bioconda -c conda-forge -y
    # # conda activate deseq2_env
    
    # # Alternatively, within R:
    # # if (!requireNamespace("BiocManager", quietly = TRUE))
    # #     install.packages("BiocManager")
    # # BiocManager::install("DESeq2")
    
    # --- Prepare Input Files (replace with your actual data) ---
    # Create a dummy design file (replace with your actual sample metadata)
    # This file should contain sample names as row names and experimental conditions as columns.
    # Example: sample,condition,batch
    #          sample1,treated,A
    #          sample2,treated,A
    #          sample3,control,B
    #          sample4,control,B
    cat << EOF > design.csv
    sample,condition,batch
    sample1,treated,A
    sample2,treated,A
    sample3,control,B
    sample4,control,B
    EOF
    
    # Create dummy count files (replace with your actual gene count matrix)
    # This file should contain gene IDs as row names and sample names as columns with raw counts.
    # Example: gene_id,sample1,sample2,sample3,sample4
    #          geneA,100,120,50,60
    #          geneB,200,210,100,110
    #          geneC,50,60,150,160
    #          geneD,10,12,5,6
    #          geneE,300,320,150,170
    cat << EOF > counts.csv
    gene_id,sample1,sample2,sample3,sample4
    geneA,100,120,50,60
    geneB,200,210,100,110
    geneC,50,60,150,160
    geneD,10,12,5,6
    geneE,300,320,150,170
    EOF
    
    # --- R Script for DESeq2 Analysis ---
    # This script will perform the differential expression analysis and save the results.
    cat << 'EOF' > run_deseq2.R
    library(DESeq2)
    
    # Load count data
    # Ensure the first column is gene IDs and subsequent columns are raw counts for each sample.
    count_data <- read.csv("counts.csv", row.names = 1)
    count_data <- as.matrix(count_data)
    
    # Load sample metadata
    # Ensure the first column is sample IDs and subsequent columns are experimental factors.
    sample_info <- read.csv("design.csv", row.names = 1)
    
    # Ensure sample order in colData matches countData columns
    sample_info <- sample_info[colnames(count_data), , drop = FALSE]
    
    # Create DESeqDataSet object
    # The 'design' formula specifies the experimental factors to model.
    # Example: ~ condition (for a simple two-group comparison)
    # Example: ~ batch + condition (to account for batch effects)
    dds <- DESeqDataSetFromMatrix(countData = count_data,
                                  colData = sample_info,
                                  design = ~ condition) # Adjust design formula as per your experiment
    
    # Pre-filtering: remove genes with very low counts across all samples
    # This can improve power by removing genes that are unlikely to be differentially expressed.
    keep <- rowSums(counts(dds)) >= 10
    dds <- dds[keep,]
    
    # Run DESeq2 analysis
    dds <- DESeq(dds)
    
    # Get results for a specific contrast (e.g., 'treated' vs 'control')
    # Replace 'condition', 'treated', 'control' with your actual factor name and levels.
    res <- results(dds, contrast = c("condition", "treated", "control"))
    
    # Order results by adjusted p-value
    res <- res[order(res$padj), ]
    
    # Save results to a CSV file
    write.csv(as.data.frame(res), file = "deseq2_results.csv")
    
    # Generate and save normalized counts
    normalized_counts <- counts(dds, normalized = TRUE)
    write.csv(as.data.frame(normalized_counts), file = "normalized_counts.csv")
    
    message("Differential gene expression analysis complete. Results saved to deseq2_results.csv and normalized_counts.csv")
    
    # Optional: Generate MA plot
    # png("MA_plot.png")
    # plotMA(res, main="MA plot of DESeq2 results")
    # dev.off()
    
    # Optional: Generate Volcano plot (requires EnhancedVolcano package)
    # if (!requireNamespace("BiocManager", quietly = TRUE))
    #     install.packages("BiocManager")
    # BiocManager::install("EnhancedVolcano")
    # library(EnhancedVolcano)
    # png("Volcano_plot.png", width=800, height=800)
    # EnhancedVolcano(res,
    #                 lab = rownames(res),
    #                 x = 'log2FoldChange',
    #                 y = 'pvalue',
    #                 title = 'Volcano Plot',
    #                 pCutoff = 0.05,
    #                 FCcutoff = 1.5,
    #                 pointSize = 3.0,
    #                 labSize = 6.0,
    #                 colAlpha = 1,
    #                 legendPosition = 'right',
    #                 caption = bquote(~Log[2]~"FC cutoff, 1.5; p-value cutoff, 0.05"),
    #                 gridlines.major = FALSE,
    #                 gridlines.minor = FALSE,
    #                 border = 'full',
    #                 borderWidth = 1.5,
    #                 borderColour = 'black')
    # dev.off()
    EOF
    
    # --- Execute the R script ---
    Rscript run_deseq2.R
    
    echo "Differential gene expression analysis finished. Check deseq2_results.csv for results."
    
Raw Source Text
Space v3.1.0 to align reads to mouse reference genome (mm10)
Used Loupe Browser to filter Atoh1, Sox4, Lgr5 high progenitor-containing spots and Dclk1, Klf4, Lyz1, Chga high mature secretory lineage-containing spots (Log2 expression > 0.5)
Differential gene expression analysis
Genome_build: mm10
← Back to Analysis