GSE205941 Processing Pipeline

OTHER code_examples 5 steps

Publication

Small intestine and colon tissue-resident memory CD8<sup>+</sup> T cells exhibit molecular heterogeneity and differential dependence on Eomes.

Immunity (2023) — PMID 36580919

Dataset

GSE205941

Small intestine and colon tissue-resident memory CD8+ T cells exhibit transcriptional, epigenetic, and functional heterogeneity in concert with diffe…

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes.

Cell Ranger v6.0.1

$ Bash example

# Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes.

# Installation instructions (commented out):
# Cell Ranger is typically downloaded and installed directly from 10x Genomics.
# For example, to install version 6.0.1:
# wget https://cf.10xgenomics.com/releases/cell-ranger/cellranger-6.0.1.tar.gz
# tar -xzf cellranger-6.0.1.tar.gz
# export PATH=/path/to/cellranger-6.0.1:$PATH
# Ensure the Cell Ranger executable is in your PATH.

# Reference dataset setup (placeholder):
# The description does not specify a reference genome. Using human GRCh38 as a common placeholder.
# Download a pre-built human GRCh38 transcriptome reference from 10x Genomics (e.g., 2020-A):
# wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
# tar -xzf refdata-gex-GRCh38-2020-A.tar.gz
REFERENCE_TRANSCRIPTOME="/path/to/refdata-gex-GRCh38-2020-A" # <<< REPLACE with your actual reference path

# Input FASTQ files (placeholder):
# Replace with the directory containing your FASTQ files.
# FASTQ files should be named according to 10x Genomics specifications (e.g., SampleName_S1_L001_R1_001.fastq.gz).
FASTQ_DIR="/path/to/your/fastq_directory" # <<< REPLACE with your actual FASTQ directory

# Sample ID and output directory:
SAMPLE_ID="my_single_cell_sample" # <<< REPLACE with your sample identifier
OUTPUT_DIR="${SAMPLE_ID}_cellranger_output"

# Execute Cell Ranger count command:
# This command processes sequencing information and single cell barcodes.
# Parameters like --expect-cells might need adjustment based on experiment design.
cellranger count \
    --id="${OUTPUT_DIR}" \
    --transcriptome="${REFERENCE_TRANSCRIPTOME}" \
    --fastqs="${FASTQ_DIR}" \
    --sample="${SAMPLE_ID}" \
    --expect-cells=3000 # Example: Expected number of cells. Adjust based on your experiment.

Gene and cell filtering, clustering, differential and average expression using Seurat (v4.1.1)

Seurat v4.1.1 GitHub

$ Bash example

#!/bin/bash

# This script performs gene and cell filtering, clustering, differential and average expression
# using the Seurat R package (v4.1.1).

# --- Installation Instructions (commented out) ---
# Install R if not already available on your system.
# For Debian/Ubuntu:
# sudo apt-get update
# sudo apt-get install r-base

# Install Seurat and its dependencies within R:
# R -q -e 'install.packages("Seurat", repos="https://cran.rstudio.com/")'
# R -q -e 'install.packages("SeuratObject", repos="https://cran.rstudio.com/")'
# R -q -e 'install.packages("patchwork", repos="https://cran.rstudio.com/")' # Often used for plotting

# --- Define Input and Output Paths ---
# Placeholder for your 10x Genomics data directory (containing matrix.mtx, barcodes.tsv, features.tsv)
# IMPORTANT: Replace 'path/to/your/10x_data' with the actual path to your input data.
INPUT_10X_DATA_DIR="path/to/your/10x_data"

# Output directory for Seurat analysis results
OUTPUT_DIR="seurat_analysis_results"
mkdir -p "$OUTPUT_DIR"

# R script filename
R_SCRIPT_FILE="run_seurat_analysis.R"

# --- Create R Script for Seurat Analysis ---
cat <<EOF > "$R_SCRIPT_FILE"
library(Seurat)
library(SeuratObject)
# library(patchwork) # Uncomment if you plan to generate plots within R

# --- 1. Load Data ---
# Read 10x Genomics data (matrix, barcodes, features)
# Ensure the INPUT_10X_DATA_DIR variable points to the correct directory.
data_dir <- "$INPUT_10X_DATA_DIR"
if (!dir.exists(data_dir)) {
  stop("Error: Input data directory does not exist: ", data_dir, "\nPlease update INPUT_10X_DATA_DIR in the bash script.")
}
counts <- Read10X(data.dir = data_dir)

# Create Seurat object
# min.cells: include features expressed in at least this many cells
# min.features: include cells with at least this many features
seurat_obj <- CreateSeuratObject(counts = counts, project = "scRNAseq_analysis", min.cells = 3, min.features = 200)

# --- 2. Quality Control and Filtering ---
# Calculate mitochondrial percentage
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-")

# Filter cells based on QC metrics
# Adjust these thresholds based on your specific dataset's quality and expected cell types.
# nFeature_RNA: number of genes detected per cell
# nCount_RNA: total number of molecules (UMIs) detected per cell
# percent.mt: percentage of mitochondrial reads
seurat_obj <- subset(seurat_obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

# --- 3. Normalization and Feature Selection ---
# Normalize data using LogNormalize method with a scale factor of 10,000
seurat_obj <- NormalizeData(seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)

# Identify highly variable features (genes)
# selection.method: "vst" (variance stabilizing transformation) is recommended for UMI data
# nfeatures: number of variable features to identify
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)

# --- 4. Scaling and Dimensionality Reduction ---
# Scale the data (regress out unwanted variation if needed, e.g., percent.mt, nCount_RNA)
# For simplicity, scaling all genes here without regression.
all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)

# Run Principal Component Analysis (PCA)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj))

# --- 5. Clustering ---
# Determine the number of dimensions (PCs) to use for clustering.
# This often involves inspecting an ElbowPlot or JackStrawPlot (not included here for brevity).
# Placeholder: Using the first 10 PCs. Adjust 'num_pcs' as appropriate for your data.
num_pcs <- 10

# Find cell neighbors based on PCA space
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:num_pcs)

# Find clusters using the Louvain algorithm
# resolution: controls the granularity of the clustering. Higher values lead to more clusters.
resolution_param <- 0.5 # Adjust this value (e.g., 0.4 to 1.2) based on desired cluster number
seurat_obj <- FindClusters(seurat_obj, resolution = resolution_param)

# Run UMAP for visualization
seurat_obj <- RunUMAP(seurat_obj, dims = 1:num_pcs)

# --- 6. Differential Expression Analysis ---
# Find markers for all clusters compared to all other cells
# only.pos: only return positive markers
# min.pct: minimum percentage of cells in either of the two groups a gene must be expressed in
# logfc.threshold: minimum log-fold change for a gene to be considered a marker
all_cluster_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
write.csv(all_cluster_markers, file = file.path("$OUTPUT_DIR", "all_cluster_markers.csv"), row.names = FALSE)

# Example: Find markers for a specific cluster (e.g., cluster 0) vs. all other cells
# cluster0_markers <- FindMarkers(seurat_obj, ident.1 = 0, min.pct = 0.25, logfc.threshold = 0.25)
# write.csv(cluster0_markers, file = file.path("$OUTPUT_DIR", "cluster0_markers.csv"), row.names = TRUE)

# --- 7. Average Expression ---
# Calculate average expression of all genes across all identified clusters
avg_expression_by_cluster <- AverageExpression(seurat_obj, group.by = "seurat_clusters")
write.csv(avg_expression_by_cluster$RNA, file = file.path("$OUTPUT_DIR", "average_expression_by_cluster.csv"), row.names = TRUE)

# --- 8. Save Processed Seurat Object ---
saveRDS(seurat_obj, file = file.path("$OUTPUT_DIR", "processed_seurat_object.rds"))

message("Seurat analysis complete. Results saved to: ", "$OUTPUT_DIR")
EOF

# --- Execute the R Script ---
Rscript "$R_SCRIPT_FILE"

View on GitHub

Cell Ranger output was converted to .loom files with velocyto (v0.17.17) for velocity analysis.

Cell Ranger v0.17.17 GitHub

$ Bash example

# Install velocyto and its dependencies
# conda install -c bioconda velocyto
# pip install loompy

# Placeholder for Cell Ranger output directory
# This directory is generated by `cellranger count` and contains files like `possorted_genome_bam.bam`
# and `filtered_feature_bc_matrix.h5`.
CELLRANGER_OUTPUT_DIR="path/to/cellranger_output_directory"

# Placeholder for reference GTF file (e.g., human GRCh38 from Ensembl or GENCODE)
# This GTF file should match the genome assembly used by Cell Ranger.
GENES_GTF="path/to/reference/Homo_sapiens.GRCh38.109.gtf"

# Convert Cell Ranger output to .loom files using velocyto
# The 'run10x' command is specifically designed for 10x Genomics Cell Ranger output.
velocyto run10x "${CELLRANGER_OUTPUT_DIR}" "${GENES_GTF}"

View on GitHub

Velocity and latent time anlysis using scVelo (v0.2.4).

Velocyto v0.2.4 GitHub

$ Bash example

# Install scVelo (if not already installed)
# pip install scvelo==0.2.4
# # Or using conda:
# # conda install -c conda-forge scvelo=0.2.4

# This command executes a basic scVelo workflow for velocity and latent time analysis.
# It assumes an 'input.loom' file is available, which typically contains spliced and unspliced counts
# generated by tools like velocyto (e.g., velocyto run -o output_folder -e exons.gtf -m masks.gtf genome_assembly.fa aligned_reads.bam).
# The 'input.loom' file serves as the primary input dataset for scVelo.
# The output will be an anndata object 'scvelo_analysis_output.h5ad' containing velocity and latent time information.
python -c "import scvelo as scv; import scanpy as sc; adata = scv.read('input.loom', cache=True); scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000); scv.pp.moments(adata, n_pcs=30, n_neighbors=30); scv.tl.velocity(adata); scv.tl.velocity_graph(adata); scv.tl.latent_time(adata); adata.write('scvelo_analysis_output.h5ad')"

View on GitHub

Library strategy: CITE-seq

CellRanger (Inferred with models/gemini-2.5-flash) v7.0.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install CellRanger (example - adjust path as needed)
# wget https://cf.10xgenomics.com/releases/cell-ranger/cellranger-7.0.0.tar.gz
# tar -xzf cellranger-7.0.0.tar.gz
# export PATH=/path/to/cellranger-7.0.0:$PATH

# Define input and output directories
INPUT_FASTQ_DIR="raw_data/cite_seq_fastqs"
OUTPUT_DIR="cite_seq_analysis_output"
SAMPLE_ID="my_cite_seq_sample"

# Define CellRanger reference data (e.g., human GRCh38)
# Download from 10x Genomics: https://www.10xgenomics.com/support/software/cell-ranger/latest/downloads
CELLRANGER_REF="/path/to/refdata-gex-GRCh38-2020-A" # Placeholder for human reference

# Create a config.csv file for cellranger multi
# This file specifies the library types, FASTQ paths, and feature barcode information.
# Example config.csv content (replace with actual paths and feature definitions):
# [gene-expression]
# reference,/path/to/refdata-gex-GRCh38-2020-A
# fastqs,/path/to/raw_data/cite_seq_fastqs
# sample,my_cite_seq_sample
# [feature]
# reference,/path/to/feature_reference.csv
# fastqs,/path/to/raw_data/cite_seq_fastqs
# sample,my_cite_seq_sample
#
# A feature_reference.csv would contain (example for ADTs):
# id,name,read,pattern,sequence,feature_type
# ADT_1,CD3,R2,5P(BC),AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC,Antibody Capture
# ADT_2,CD4,R2,5P(BC),GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG,Antibody Capture
# ...

# For demonstration, assume a config.csv is already prepared.
CONFIG_CSV="config.csv" # Path to your prepared config.csv

# Run cellranger multi for combined gene expression and feature barcode analysis
cellranger multi --id=${SAMPLE_ID} --csv=${CONFIG_CSV} --output-directory=${OUTPUT_DIR}

View on GitHub

Raw Source Text

Cell Ranger (v6.0.1) was used to process sequencing information and single cell barcodes.
Gene and cell filtering, clustering, differential and average expression using Seurat (v4.1.1)
Cell Ranger output was converted to .loom files with velocyto (v0.17.17) for velocity analysis.
Velocity and latent time anlysis using scVelo (v0.2.4).
Assembly: mm10
Supplementary files format and content: 10x Genomics output files: barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz
Library strategy: CITE-seq

← Back to Analysis