GSE205941 Processing Pipeline
OTHER
code_examples
5 steps
Publication
Small intestine and colon tissue-resident memory CD8<sup>+</sup> T cells exhibit molecular heterogeneity and differential dependence on Eomes.Immunity (2023) — PMID 36580919
Dataset
GSE205941Small intestine and colon tissue-resident memory CD8+ T cells exhibit transcriptional, epigenetic, and functional heterogeneity in concert with diffe…
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes.
Cell Ranger v6.0.1$ Bash example
# Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes. # Installation instructions (commented out): # Cell Ranger is typically downloaded and installed directly from 10x Genomics. # For example, to install version 6.0.1: # wget https://cf.10xgenomics.com/releases/cell-ranger/cellranger-6.0.1.tar.gz # tar -xzf cellranger-6.0.1.tar.gz # export PATH=/path/to/cellranger-6.0.1:$PATH # Ensure the Cell Ranger executable is in your PATH. # Reference dataset setup (placeholder): # The description does not specify a reference genome. Using human GRCh38 as a common placeholder. # Download a pre-built human GRCh38 transcriptome reference from 10x Genomics (e.g., 2020-A): # wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz # tar -xzf refdata-gex-GRCh38-2020-A.tar.gz REFERENCE_TRANSCRIPTOME="/path/to/refdata-gex-GRCh38-2020-A" # <<< REPLACE with your actual reference path # Input FASTQ files (placeholder): # Replace with the directory containing your FASTQ files. # FASTQ files should be named according to 10x Genomics specifications (e.g., SampleName_S1_L001_R1_001.fastq.gz). FASTQ_DIR="/path/to/your/fastq_directory" # <<< REPLACE with your actual FASTQ directory # Sample ID and output directory: SAMPLE_ID="my_single_cell_sample" # <<< REPLACE with your sample identifier OUTPUT_DIR="${SAMPLE_ID}_cellranger_output" # Execute Cell Ranger count command: # This command processes sequencing information and single cell barcodes. # Parameters like --expect-cells might need adjustment based on experiment design. cellranger count \ --id="${OUTPUT_DIR}" \ --transcriptome="${REFERENCE_TRANSCRIPTOME}" \ --fastqs="${FASTQ_DIR}" \ --sample="${SAMPLE_ID}" \ --expect-cells=3000 # Example: Expected number of cells. Adjust based on your experiment. -
2
Gene and cell filtering, clustering, differential and average expression using Seurat (v4.1.1)
$ Bash example
#!/bin/bash # This script performs gene and cell filtering, clustering, differential and average expression # using the Seurat R package (v4.1.1). # --- Installation Instructions (commented out) --- # Install R if not already available on your system. # For Debian/Ubuntu: # sudo apt-get update # sudo apt-get install r-base # Install Seurat and its dependencies within R: # R -q -e 'install.packages("Seurat", repos="https://cran.rstudio.com/")' # R -q -e 'install.packages("SeuratObject", repos="https://cran.rstudio.com/")' # R -q -e 'install.packages("patchwork", repos="https://cran.rstudio.com/")' # Often used for plotting # --- Define Input and Output Paths --- # Placeholder for your 10x Genomics data directory (containing matrix.mtx, barcodes.tsv, features.tsv) # IMPORTANT: Replace 'path/to/your/10x_data' with the actual path to your input data. INPUT_10X_DATA_DIR="path/to/your/10x_data" # Output directory for Seurat analysis results OUTPUT_DIR="seurat_analysis_results" mkdir -p "$OUTPUT_DIR" # R script filename R_SCRIPT_FILE="run_seurat_analysis.R" # --- Create R Script for Seurat Analysis --- cat <<EOF > "$R_SCRIPT_FILE" library(Seurat) library(SeuratObject) # library(patchwork) # Uncomment if you plan to generate plots within R # --- 1. Load Data --- # Read 10x Genomics data (matrix, barcodes, features) # Ensure the INPUT_10X_DATA_DIR variable points to the correct directory. data_dir <- "$INPUT_10X_DATA_DIR" if (!dir.exists(data_dir)) { stop("Error: Input data directory does not exist: ", data_dir, "\nPlease update INPUT_10X_DATA_DIR in the bash script.") } counts <- Read10X(data.dir = data_dir) # Create Seurat object # min.cells: include features expressed in at least this many cells # min.features: include cells with at least this many features seurat_obj <- CreateSeuratObject(counts = counts, project = "scRNAseq_analysis", min.cells = 3, min.features = 200) # --- 2. Quality Control and Filtering --- # Calculate mitochondrial percentage seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-") # Filter cells based on QC metrics # Adjust these thresholds based on your specific dataset's quality and expected cell types. # nFeature_RNA: number of genes detected per cell # nCount_RNA: total number of molecules (UMIs) detected per cell # percent.mt: percentage of mitochondrial reads seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) # --- 3. Normalization and Feature Selection --- # Normalize data using LogNormalize method with a scale factor of 10,000 seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000) # Identify highly variable features (genes) # selection.method: "vst" (variance stabilizing transformation) is recommended for UMI data # nfeatures: number of variable features to identify seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000) # --- 4. Scaling and Dimensionality Reduction --- # Scale the data (regress out unwanted variation if needed, e.g., percent.mt, nCount_RNA) # For simplicity, scaling all genes here without regression. all.genes <- rownames(seurat_obj) seurat_obj <- ScaleData(seurat_obj, features = all.genes) # Run Principal Component Analysis (PCA) seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj)) # --- 5. Clustering --- # Determine the number of dimensions (PCs) to use for clustering. # This often involves inspecting an ElbowPlot or JackStrawPlot (not included here for brevity). # Placeholder: Using the first 10 PCs. Adjust 'num_pcs' as appropriate for your data. num_pcs <- 10 # Find cell neighbors based on PCA space seurat_obj <- FindNeighbors(seurat_obj, dims = 1:num_pcs) # Find clusters using the Louvain algorithm # resolution: controls the granularity of the clustering. Higher values lead to more clusters. resolution_param <- 0.5 # Adjust this value (e.g., 0.4 to 1.2) based on desired cluster number seurat_obj <- FindClusters(seurat_obj, resolution = resolution_param) # Run UMAP for visualization seurat_obj <- RunUMAP(seurat_obj, dims = 1:num_pcs) # --- 6. Differential Expression Analysis --- # Find markers for all clusters compared to all other cells # only.pos: only return positive markers # min.pct: minimum percentage of cells in either of the two groups a gene must be expressed in # logfc.threshold: minimum log-fold change for a gene to be considered a marker all_cluster_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25) write.csv(all_cluster_markers, file = file.path("$OUTPUT_DIR", "all_cluster_markers.csv"), row.names = FALSE) # Example: Find markers for a specific cluster (e.g., cluster 0) vs. all other cells # cluster0_markers <- FindMarkers(seurat_obj, ident.1 = 0, min.pct = 0.25, logfc.threshold = 0.25) # write.csv(cluster0_markers, file = file.path("$OUTPUT_DIR", "cluster0_markers.csv"), row.names = TRUE) # --- 7. Average Expression --- # Calculate average expression of all genes across all identified clusters avg_expression_by_cluster <- AverageExpression(seurat_obj, group.by = "seurat_clusters") write.csv(avg_expression_by_cluster$RNA, file = file.path("$OUTPUT_DIR", "average_expression_by_cluster.csv"), row.names = TRUE) # --- 8. Save Processed Seurat Object --- saveRDS(seurat_obj, file = file.path("$OUTPUT_DIR", "processed_seurat_object.rds")) message("Seurat analysis complete. Results saved to: ", "$OUTPUT_DIR") EOF # --- Execute the R Script --- Rscript "$R_SCRIPT_FILE" -
3
Cell Ranger output was converted to .loom files with velocyto (v0.17.17) for velocity analysis.
$ Bash example
# Install velocyto and its dependencies # conda install -c bioconda velocyto # pip install loompy # Placeholder for Cell Ranger output directory # This directory is generated by `cellranger count` and contains files like `possorted_genome_bam.bam` # and `filtered_feature_bc_matrix.h5`. CELLRANGER_OUTPUT_DIR="path/to/cellranger_output_directory" # Placeholder for reference GTF file (e.g., human GRCh38 from Ensembl or GENCODE) # This GTF file should match the genome assembly used by Cell Ranger. GENES_GTF="path/to/reference/Homo_sapiens.GRCh38.109.gtf" # Convert Cell Ranger output to .loom files using velocyto # The 'run10x' command is specifically designed for 10x Genomics Cell Ranger output. velocyto run10x "${CELLRANGER_OUTPUT_DIR}" "${GENES_GTF}" -
4
Velocity and latent time anlysis using scVelo (v0.2.4).
$ Bash example
# Install scVelo (if not already installed) # pip install scvelo==0.2.4 # # Or using conda: # # conda install -c conda-forge scvelo=0.2.4 # This command executes a basic scVelo workflow for velocity and latent time analysis. # It assumes an 'input.loom' file is available, which typically contains spliced and unspliced counts # generated by tools like velocyto (e.g., velocyto run -o output_folder -e exons.gtf -m masks.gtf genome_assembly.fa aligned_reads.bam). # The 'input.loom' file serves as the primary input dataset for scVelo. # The output will be an anndata object 'scvelo_analysis_output.h5ad' containing velocity and latent time information. python -c "import scvelo as scv; import scanpy as sc; adata = scv.read('input.loom', cache=True); scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000); scv.pp.moments(adata, n_pcs=30, n_neighbors=30); scv.tl.velocity(adata); scv.tl.velocity_graph(adata); scv.tl.latent_time(adata); adata.write('scvelo_analysis_output.h5ad')" -
5
Library strategy: CITE-seq
CellRanger (Inferred with models/gemini-2.5-flash) v7.0.0 (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install CellRanger (example - adjust path as needed) # wget https://cf.10xgenomics.com/releases/cell-ranger/cellranger-7.0.0.tar.gz # tar -xzf cellranger-7.0.0.tar.gz # export PATH=/path/to/cellranger-7.0.0:$PATH # Define input and output directories INPUT_FASTQ_DIR="raw_data/cite_seq_fastqs" OUTPUT_DIR="cite_seq_analysis_output" SAMPLE_ID="my_cite_seq_sample" # Define CellRanger reference data (e.g., human GRCh38) # Download from 10x Genomics: https://www.10xgenomics.com/support/software/cell-ranger/latest/downloads CELLRANGER_REF="/path/to/refdata-gex-GRCh38-2020-A" # Placeholder for human reference # Create a config.csv file for cellranger multi # This file specifies the library types, FASTQ paths, and feature barcode information. # Example config.csv content (replace with actual paths and feature definitions): # [gene-expression] # reference,/path/to/refdata-gex-GRCh38-2020-A # fastqs,/path/to/raw_data/cite_seq_fastqs # sample,my_cite_seq_sample # [feature] # reference,/path/to/feature_reference.csv # fastqs,/path/to/raw_data/cite_seq_fastqs # sample,my_cite_seq_sample # # A feature_reference.csv would contain (example for ADTs): # id,name,read,pattern,sequence,feature_type # ADT_1,CD3,R2,5P(BC),AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC,Antibody Capture # ADT_2,CD4,R2,5P(BC),GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG,Antibody Capture # ... # For demonstration, assume a config.csv is already prepared. CONFIG_CSV="config.csv" # Path to your prepared config.csv # Run cellranger multi for combined gene expression and feature barcode analysis cellranger multi --id=${SAMPLE_ID} --csv=${CONFIG_CSV} --output-directory=${OUTPUT_DIR}
Raw Source Text
Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes. Gene and cell filtering, clustering, differential and average expression using Seurat (v4.1.1) Cell Ranger output was converted to .loom files with velocyto (v0.17.17) for velocity analysis. Velocity and latent time anlysis using scVelo (v0.2.4). Assembly: mm10 Supplementary files format and content: 10x Genomics output files: barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz Library strategy: CITE-seq