GSE184564 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
RNA binding protein DDX5 directs tuft cell specification and function to regulate microbial repertoire and disease susceptibility in the intestine.Gut (2022) — PMID 34853057
Dataset
GSE184564Spatial transcriptomic studies of the small intestine from WT and DDX5â³IEC (KO) mice
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Space v3.1.0 to align reads to mouse reference genome (mm10)
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables # Assuming paired-end reads, adjust if single-end READS_R1="sample_R1.fastq.gz" READS_R2="sample_R2.fastq.gz" GENOME_DIR="/path/to/STAR_index/mm10" # Placeholder for mouse (mm10) STAR index OUTPUT_PREFIX="sample_aligned_" NUM_THREADS=8 # Adjust based on available CPU cores RAM_LIMIT_BAM_SORT=60000000000 # 60GB, adjust based on available RAM # Note: The STAR genome index for mm10 must be pre-built. # Example command to build STAR index (run once per genome): # STAR --runThreadN "${NUM_THREADS}" \ # --runMode genomeGenerate \ # --genomeDir "${GENOME_DIR}" \ # --genomeFastaFiles /path/to/mm10.fa \ # --sjdbGTFfile /path/to/mm10.gtf \ # --sjdbOverhang 100 # Adjust based on read length # Align reads to the mouse (mm10) reference genome using STAR STAR --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READS_R1}" "${READS_R2}" \ --readFilesCommand zcat \ --runThreadN "${NUM_THREADS}" \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes All \ --outFilterMultimapNmax 1 \ --outFilterMismatchNmax 3 \ --alignIntronMax 1 \ --alignEndsType Local \ --outSAMunmapped Within \ --outSAMstrandField intronMotif \ --outFilterScoreMinOverLread 0.66 \ --outFilterMatchNminOverLread 0.66 \ --limitBAMsortRAM "${RAM_LIMIT_BAM_SORT}" -
2
Used Loupe Browser to filter Atoh1, Sox4, Lgr5 high progenitor-containing spots and Dclk1, Klf4, Lyz1, Chga high mature secretory lineage-containing spots (Log2 expression > 0.5)
$ Bash example
# Loupe Browser is primarily a GUI tool for interactive visualization and analysis. # The filtering described is typically performed interactively within the Loupe Browser application. # Below is a conceptual Python script that would achieve similar filtering programmatically # if the expression data were available in a tabular format (e.g., CSV). # Create a dummy expression_matrix.csv for demonstration purposes # In a real scenario, this file would be generated by upstream steps (e.g., Space Ranger output) echo "Spot_ID,Atoh1,Sox4,Lgr5,Dclk1,Klf4,Lyz1,Chga,OtherGene" > expression_matrix.csv echo "Spot1,0.1,0.2,0.6,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Progenitor echo "Spot2,0.7,0.1,0.2,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Progenitor echo "Spot3,0.1,0.1,0.1,0.6,0.2,0.1,0.1,1.0" >> expression_matrix.csv # Mature Secretory echo "Spot4,0.1,0.1,0.1,0.1,0.7,0.1,0.1,1.0" >> expression_matrix.csv # Mature Secretory echo "Spot5,0.1,0.1,0.1,0.1,0.1,0.1,0.1,1.0" >> expression_matrix.csv # Neither echo "Spot6,0.8,0.7,0.6,0.9,0.8,0.7,0.6,1.0" >> expression_matrix.csv # Both (will appear in both outputs) # Ensure Python and pandas are installed # conda install pandas numpy # Execute the Python script for filtering python -c ' import pandas as pd import numpy as np input_matrix = "expression_matrix.csv" output_progenitor = "filtered_progenitor_spots.csv" output_mature_secretory = "filtered_mature_secretory_spots.csv" progenitor_genes = ["Atoh1", "Sox4", "Lgr5"] mature_secretory_genes = ["Dclk1", "Klf4", "Lyz1", "Chga"] log2_threshold = 0.5 try: df = pd.read_csv(input_matrix, index_col=0) progenitor_filter = df[progenitor_genes].gt(log2_threshold).any(axis=1) filtered_progenitor_spots = df[progenitor_filter] mature_secretory_filter = df[mature_secretory_genes].gt(log2_threshold).any(axis=1) filtered_mature_secretory_spots = df[mature_secretory_filter] filtered_progenitor_spots.to_csv(output_progenitor) filtered_mature_secretory_spots.to_csv(output_mature_secretory) print(f"Filtered progenitor spots saved to {output_progenitor}") print(f"Filtered mature secretory spots saved to {output_mature_secretory}") except FileNotFoundError: print(f"Error: Input file \\'{input_matrix}\\' not found. Please ensure the file exists.") except KeyError as e: print(f"Error: One or more specified genes not found in the expression matrix columns: {e}") except Exception as e: print(f"An unexpected error occurred: {e}") ' # Clean up dummy file (optional, for a real pipeline this would be an actual input) # rm expression_matrix.csv -
3
Differential gene expression analysis
$ Bash example
# This script performs differential gene expression analysis using DESeq2. # It assumes you have a gene count matrix (e.g., from featureCounts, Salmon, Kallisto, RSEM) # and a sample metadata file with experimental design information. # --- Installation (uncomment and run if DESeq2 is not installed) --- # # Install R and Bioconductor if not already present # # For example, using conda: # # conda create -n deseq2_env r-base=4.2.0 bioconductor-deseq2=1.36.0 -c bioconda -c conda-forge -y # # conda activate deseq2_env # # Alternatively, within R: # # if (!requireNamespace("BiocManager", quietly = TRUE)) # # install.packages("BiocManager") # # BiocManager::install("DESeq2") # --- Prepare Input Files (replace with your actual data) --- # Create a dummy design file (replace with your actual sample metadata) # This file should contain sample names as row names and experimental conditions as columns. # Example: sample,condition,batch # sample1,treated,A # sample2,treated,A # sample3,control,B # sample4,control,B cat << EOF > design.csv sample,condition,batch sample1,treated,A sample2,treated,A sample3,control,B sample4,control,B EOF # Create dummy count files (replace with your actual gene count matrix) # This file should contain gene IDs as row names and sample names as columns with raw counts. # Example: gene_id,sample1,sample2,sample3,sample4 # geneA,100,120,50,60 # geneB,200,210,100,110 # geneC,50,60,150,160 # geneD,10,12,5,6 # geneE,300,320,150,170 cat << EOF > counts.csv gene_id,sample1,sample2,sample3,sample4 geneA,100,120,50,60 geneB,200,210,100,110 geneC,50,60,150,160 geneD,10,12,5,6 geneE,300,320,150,170 EOF # --- R Script for DESeq2 Analysis --- # This script will perform the differential expression analysis and save the results. cat << 'EOF' > run_deseq2.R library(DESeq2) # Load count data # Ensure the first column is gene IDs and subsequent columns are raw counts for each sample. count_data <- read.csv("counts.csv", row.names = 1) count_data <- as.matrix(count_data) # Load sample metadata # Ensure the first column is sample IDs and subsequent columns are experimental factors. sample_info <- read.csv("design.csv", row.names = 1) # Ensure sample order in colData matches countData columns sample_info <- sample_info[colnames(count_data), , drop = FALSE] # Create DESeqDataSet object # The 'design' formula specifies the experimental factors to model. # Example: ~ condition (for a simple two-group comparison) # Example: ~ batch + condition (to account for batch effects) dds <- DESeqDataSetFromMatrix(countData = count_data, colData = sample_info, design = ~ condition) # Adjust design formula as per your experiment # Pre-filtering: remove genes with very low counts across all samples # This can improve power by removing genes that are unlikely to be differentially expressed. keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] # Run DESeq2 analysis dds <- DESeq(dds) # Get results for a specific contrast (e.g., 'treated' vs 'control') # Replace 'condition', 'treated', 'control' with your actual factor name and levels. res <- results(dds, contrast = c("condition", "treated", "control")) # Order results by adjusted p-value res <- res[order(res$padj), ] # Save results to a CSV file write.csv(as.data.frame(res), file = "deseq2_results.csv") # Generate and save normalized counts normalized_counts <- counts(dds, normalized = TRUE) write.csv(as.data.frame(normalized_counts), file = "normalized_counts.csv") message("Differential gene expression analysis complete. Results saved to deseq2_results.csv and normalized_counts.csv") # Optional: Generate MA plot # png("MA_plot.png") # plotMA(res, main="MA plot of DESeq2 results") # dev.off() # Optional: Generate Volcano plot (requires EnhancedVolcano package) # if (!requireNamespace("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # BiocManager::install("EnhancedVolcano") # library(EnhancedVolcano) # png("Volcano_plot.png", width=800, height=800) # EnhancedVolcano(res, # lab = rownames(res), # x = 'log2FoldChange', # y = 'pvalue', # title = 'Volcano Plot', # pCutoff = 0.05, # FCcutoff = 1.5, # pointSize = 3.0, # labSize = 6.0, # colAlpha = 1, # legendPosition = 'right', # caption = bquote(~Log[2]~"FC cutoff, 1.5; p-value cutoff, 0.05"), # gridlines.major = FALSE, # gridlines.minor = FALSE, # border = 'full', # borderWidth = 1.5, # borderColour = 'black') # dev.off() EOF # --- Execute the R script --- Rscript run_deseq2.R echo "Differential gene expression analysis finished. Check deseq2_results.csv for results."
Raw Source Text
Space v3.1.0 to align reads to mouse reference genome (mm10) Used Loupe Browser to filter Atoh1, Sox4, Lgr5 high progenitor-containing spots and Dclk1, Klf4, Lyz1, Chga high mature secretory lineage-containing spots (Log2 expression > 0.5) Differential gene expression analysis Genome_build: mm10