GSE162335 Processing Pipeline

GSE code_examples 2 steps

Publication

RNA binding protein DDX5 restricts RORγt<sup>+</sup> T<sub>reg</sub> suppressor function to promote intestine inflammation.

Science advances (2023) — PMID 36724232

Dataset

GSE162335

Transcriptional Survey of Ileal-Anal Pouch Immune Cells from Ulcerative Colitis

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting

    Cell Ranger vlatest
    $ Bash example
    # Install Cell Ranger (example, specific version might vary)
    # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-8.0.0.tar.gz
    # tar -xzf cellranger-8.0.0.tar.gz
    # export PATH=/path/to/cellranger-8.0.0:$PATH
    
    # Download or build the 10x Genomics reference transcriptome for GRCh38
    # The description specifies 'human genome (GRCh38 ensemble)', which corresponds to 10x Genomics' pre-built references.
    # Example for a recent GRCh38 reference:
    # wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
    # tar -xzf refdata-gex-GRCh38-2024-A.tar.gz
    # REF_GENOME_PATH="/path/to/refdata-gex-GRCh38-2024-A"
    
    # Placeholder for input FASTQ files directory
    FASTQ_DIR="/path/to/your/fastqs"
    # Placeholder for sample ID (e.g., the prefix of your fastq files)
    SAMPLE_ID="my_single_cell_sample"
    # Placeholder for the 10x Genomics reference transcriptome path
    # This should be a path to a directory containing the 'fasta' and 'genes' subdirectories
    REF_GENOME_PATH="/path/to/10x_genomics_refdata_gex_GRCh38_202X_A"
    
    cellranger count \
        --id="${SAMPLE_ID}_analysis" \
        --transcriptome="${REF_GENOME_PATH}" \
        --fastqs="${FASTQ_DIR}" \
        --sample="${SAMPLE_ID}" \
        --localcores=8 \
        --localmem=64
  2. 2

    From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation

    UMAP v3.1.3 (via Seurat) GitHub
    $ Bash example
    # Install R if not already installed
    # sudo apt-get update && sudo apt-get install r-base
    #
    # Install Seurat and its dependencies (like uwot for UMAP) in R
    # Rscript -e 'install.packages("Seurat", repos="http://cran.us.r-project.org")'
    # Rscript -e 'install.packages("patchwork", repos="http://cran.us.r-project.org")' # Often used with Seurat
    # Rscript -e 'install.packages("uwot", repos="http://cran.us.r-project.org")' # UMAP dependency
    
    # Placeholder for filtered counts input file (e.g., a CSV or TSV matrix)
    # In a real scenario, 'filtered_counts.csv' would be provided as input.
    # Example: Create a dummy filtered_counts.csv for demonstration
    echo "gene,cell1,cell2,cell3" > filtered_counts.csv
    echo "gene1,10,20,30" >> filtered_counts.csv
    echo "gene2,5,15,25" >> filtered_counts.csv
    echo "gene3,20,10,5" >> filtered_counts.csv
    
    # R script to process single cell data using Seurat v3.1.3
    Rscript -e '
    library(Seurat)
    library(uwot) # UMAP dependency
    
    # Load filtered counts data
    # Adjust this loading based on the actual input format (e.g., 10x Genomics output, AnnData, etc.)
    # Assuming input is a CSV where the first column is gene names and subsequent columns are cell counts
    counts_df <- read.csv("filtered_counts.csv", row.names = 1)
    counts_matrix <- as.matrix(counts_df)
    
    # Create Seurat object
    seurat_obj <- CreateSeuratObject(counts = counts_matrix, project = "single_cell_analysis")
    
    # 1. Normalization
    seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
    
    # 2. Identify highly variable features
    seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)
    
    # 3. Scale data (for PCA)
    all.genes <- rownames(seurat_obj)
    seurat_obj <- ScaleData(seurat_obj, features = all.genes)
    
    # 4. Dimension Reduction (PCA)
    seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj), npcs = 30)
    
    # 5. Integration (placeholder - typically for multiple datasets/batches)
    # The description mentions "integration". If multiple datasets were to be integrated,
    # the workflow would involve FindIntegrationAnchors and IntegrateData. For a single dataset,
    # this step might refer to batch correction if relevant metadata is available.
    # As no specific integration method or multiple datasets are mentioned, we proceed with a single-sample workflow.
    
    # 6. UMAP representation
    seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # Use first 30 PCs for UMAP
    
    # Save the processed Seurat object
    saveRDS(seurat_obj, file = "processed_seurat_object.rds")
    
    # Optional: Save UMAP coordinates to a CSV file
    umap_coords <- Embeddings(seurat_obj, reduction = "umap")
    write.csv(umap_coords, file = "umap_coordinates.csv")
    '

Tools Used

Raw Source Text
The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting
From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation
Genome_build: GRCh38
Supplementary_files_format_and_content: tab-delimited count files, rows are genes and columns are cells
← Back to Analysis