GSE162335 Processing Pipeline

GSE code_examples 2 steps

Publication

RNA binding protein DDX5 restricts RORγt<sup>+</sup> T<sub>reg</sub> suppressor function to promote intestine inflammation.

Science advances (2023) — PMID 36724232

Dataset

GSE162335

Transcriptional Survey of Ileal-Anal Pouch Immune Cells from Ulcerative Colitis

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting

Cell Ranger vlatest

$ Bash example

# Install Cell Ranger (example, specific version might vary)
# wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-8.0.0.tar.gz
# tar -xzf cellranger-8.0.0.tar.gz
# export PATH=/path/to/cellranger-8.0.0:$PATH

# Download or build the 10x Genomics reference transcriptome for GRCh38
# The description specifies 'human genome (GRCh38 ensemble)', which corresponds to 10x Genomics' pre-built references.
# Example for a recent GRCh38 reference:
# wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
# tar -xzf refdata-gex-GRCh38-2024-A.tar.gz
# REF_GENOME_PATH="/path/to/refdata-gex-GRCh38-2024-A"

# Placeholder for input FASTQ files directory
FASTQ_DIR="/path/to/your/fastqs"
# Placeholder for sample ID (e.g., the prefix of your fastq files)
SAMPLE_ID="my_single_cell_sample"
# Placeholder for the 10x Genomics reference transcriptome path
# This should be a path to a directory containing the 'fasta' and 'genes' subdirectories
REF_GENOME_PATH="/path/to/10x_genomics_refdata_gex_GRCh38_202X_A"

cellranger count \
    --id="${SAMPLE_ID}_analysis" \
    --transcriptome="${REF_GENOME_PATH}" \
    --fastqs="${FASTQ_DIR}" \
    --sample="${SAMPLE_ID}" \
    --localcores=8 \
    --localmem=64

From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation

UMAP v3.1.3 (via Seurat) GitHub

$ Bash example

# Install R if not already installed
# sudo apt-get update && sudo apt-get install r-base
#
# Install Seurat and its dependencies (like uwot for UMAP) in R
# Rscript -e 'install.packages("Seurat", repos="http://cran.us.r-project.org")'
# Rscript -e 'install.packages("patchwork", repos="http://cran.us.r-project.org")' # Often used with Seurat
# Rscript -e 'install.packages("uwot", repos="http://cran.us.r-project.org")' # UMAP dependency

# Placeholder for filtered counts input file (e.g., a CSV or TSV matrix)
# In a real scenario, 'filtered_counts.csv' would be provided as input.
# Example: Create a dummy filtered_counts.csv for demonstration
echo "gene,cell1,cell2,cell3" > filtered_counts.csv
echo "gene1,10,20,30" >> filtered_counts.csv
echo "gene2,5,15,25" >> filtered_counts.csv
echo "gene3,20,10,5" >> filtered_counts.csv

# R script to process single cell data using Seurat v3.1.3
Rscript -e '
library(Seurat)
library(uwot) # UMAP dependency

# Load filtered counts data
# Adjust this loading based on the actual input format (e.g., 10x Genomics output, AnnData, etc.)
# Assuming input is a CSV where the first column is gene names and subsequent columns are cell counts
counts_df <- read.csv("filtered_counts.csv", row.names = 1)
counts_matrix <- as.matrix(counts_df)

# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = counts_matrix, project = "single_cell_analysis")

# 1. Normalization
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)

# 2. Identify highly variable features
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)

# 3. Scale data (for PCA)
all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)

# 4. Dimension Reduction (PCA)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj), npcs = 30)

# 5. Integration (placeholder - typically for multiple datasets/batches)
# The description mentions "integration". If multiple datasets were to be integrated,
# the workflow would involve FindIntegrationAnchors and IntegrateData. For a single dataset,
# this step might refer to batch correction if relevant metadata is available.
# As no specific integration method or multiple datasets are mentioned, we proceed with a single-sample workflow.

# 6. UMAP representation
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # Use first 30 PCs for UMAP

# Save the processed Seurat object
saveRDS(seurat_obj, file = "processed_seurat_object.rds")

# Optional: Save UMAP coordinates to a CSV file
umap_coords <- Embeddings(seurat_obj, reduction = "umap")
write.csv(umap_coords, file = "umap_coordinates.csv")
'

View on GitHub

Tools Used

UMAP

Raw Source Text

The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting
From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation
Genome_build: GRCh38
Supplementary_files_format_and_content: tab-delimited count files, rows are genes and columns are cells

← Back to Analysis