GSE162335 Processing Pipeline
GSE
code_examples
2 steps
Publication
RNA binding protein DDX5 restricts RORγt<sup>+</sup> T<sub>reg</sub> suppressor function to promote intestine inflammation.Science advances (2023) — PMID 36724232
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting
Cell Ranger vlatest$ Bash example
# Install Cell Ranger (example, specific version might vary) # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-8.0.0.tar.gz # tar -xzf cellranger-8.0.0.tar.gz # export PATH=/path/to/cellranger-8.0.0:$PATH # Download or build the 10x Genomics reference transcriptome for GRCh38 # The description specifies 'human genome (GRCh38 ensemble)', which corresponds to 10x Genomics' pre-built references. # Example for a recent GRCh38 reference: # wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz # tar -xzf refdata-gex-GRCh38-2024-A.tar.gz # REF_GENOME_PATH="/path/to/refdata-gex-GRCh38-2024-A" # Placeholder for input FASTQ files directory FASTQ_DIR="/path/to/your/fastqs" # Placeholder for sample ID (e.g., the prefix of your fastq files) SAMPLE_ID="my_single_cell_sample" # Placeholder for the 10x Genomics reference transcriptome path # This should be a path to a directory containing the 'fasta' and 'genes' subdirectories REF_GENOME_PATH="/path/to/10x_genomics_refdata_gex_GRCh38_202X_A" cellranger count \ --id="${SAMPLE_ID}_analysis" \ --transcriptome="${REF_GENOME_PATH}" \ --fastqs="${FASTQ_DIR}" \ --sample="${SAMPLE_ID}" \ --localcores=8 \ --localmem=64 -
2
From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation
$ Bash example
# Install R if not already installed # sudo apt-get update && sudo apt-get install r-base # # Install Seurat and its dependencies (like uwot for UMAP) in R # Rscript -e 'install.packages("Seurat", repos="http://cran.us.r-project.org")' # Rscript -e 'install.packages("patchwork", repos="http://cran.us.r-project.org")' # Often used with Seurat # Rscript -e 'install.packages("uwot", repos="http://cran.us.r-project.org")' # UMAP dependency # Placeholder for filtered counts input file (e.g., a CSV or TSV matrix) # In a real scenario, 'filtered_counts.csv' would be provided as input. # Example: Create a dummy filtered_counts.csv for demonstration echo "gene,cell1,cell2,cell3" > filtered_counts.csv echo "gene1,10,20,30" >> filtered_counts.csv echo "gene2,5,15,25" >> filtered_counts.csv echo "gene3,20,10,5" >> filtered_counts.csv # R script to process single cell data using Seurat v3.1.3 Rscript -e ' library(Seurat) library(uwot) # UMAP dependency # Load filtered counts data # Adjust this loading based on the actual input format (e.g., 10x Genomics output, AnnData, etc.) # Assuming input is a CSV where the first column is gene names and subsequent columns are cell counts counts_df <- read.csv("filtered_counts.csv", row.names = 1) counts_matrix <- as.matrix(counts_df) # Create Seurat object seurat_obj <- CreateSeuratObject(counts = counts_matrix, project = "single_cell_analysis") # 1. Normalization seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000) # 2. Identify highly variable features seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000) # 3. Scale data (for PCA) all.genes <- rownames(seurat_obj) seurat_obj <- ScaleData(seurat_obj, features = all.genes) # 4. Dimension Reduction (PCA) seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj), npcs = 30) # 5. Integration (placeholder - typically for multiple datasets/batches) # The description mentions "integration". If multiple datasets were to be integrated, # the workflow would involve FindIntegrationAnchors and IntegrateData. For a single dataset, # this step might refer to batch correction if relevant metadata is available. # As no specific integration method or multiple datasets are mentioned, we proceed with a single-sample workflow. # 6. UMAP representation seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # Use first 30 PCs for UMAP # Save the processed Seurat object saveRDS(seurat_obj, file = "processed_seurat_object.rds") # Optional: Save UMAP coordinates to a CSV file umap_coords <- Embeddings(seurat_obj, reduction = "umap") write.csv(umap_coords, file = "umap_coordinates.csv") '
Tools Used
Raw Source Text
The Cellranger software suite (https://support.10xgenomics.com/single-cell-gene- expression/software/pipelines/latest/what-is-cell-ranger) from 10X was used to demultiplex cellular barcodes, align reads to the human genome (GRCh38 ensemble, http://useast.ensembl.org/Homo_sapiens/Info/Index) and perform UMI counting From filtered counts Seurat1 version 3.1.3 was used to process the single cell data including normalization, integration, dimension reduction, UMAP representation Genome_build: GRCh38 Supplementary_files_format_and_content: tab-delimited count files, rows are genes and columns are cells