GSE122713 Processing Pipeline

GSE code_examples 3 steps

Publication

Heterogenous Populations of Tissue-Resident CD8<sup>+</sup> T Cells Are Generated in Response to Infection and Malignancy.

Immunity (2020) — PMID 32433949

Dataset

GSE122713

scRNA-seq, bulk RNA-seq, and ATAC-seq data from progenitor exhausted and terminally exhausted CD8+ T cells from tumors and chronic viral infection

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Sample demultiplexing, barcode processing, alignment, filtering, UMI counting, and aggregation of multiple sequencing runs were performed using the Cell Ranger analysis pipeline (v1.2)

Cell Ranger v1.2

$ Bash example

# Install Cell Ranger (example: download and add to PATH)
# # Download Cell Ranger v1.2 from 10x Genomics website (requires login)
# # e.g., wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-1.2.0.tar.gz
# # tar -xzf cellranger-1.2.0.tar.gz
# # export PATH=/path/to/cellranger-1.2.0:$PATH

# Define variables
OUTPUT_ID="my_cellranger_analysis"
FASTQ_DIR="/path/to/your/fastq_files" # Directory containing FASTQ files (e.g., from cellranger mkfastq)
SAMPLE_NAME="my_sample"
# Reference transcriptome (e.g., human GRCh38-2020-A)
# Download from 10x Genomics: https://www.10xgenomics.com/resources/cell-ranger-downloads
TRANSCRIPTOME_REF="/path/to/refdata-gex-GRCh38-2020-A"

# Run Cell Ranger count to perform demultiplexing, barcode processing, alignment, filtering, and UMI counting
# This command processes a single sample, potentially from multiple sequencing runs/lanes if FASTQs are in the specified directory.
cellranger count \
    --id="${OUTPUT_ID}" \
    --transcriptome="${TRANSCRIPTOME_REF}" \
    --fastqs="${FASTQ_DIR}" \
    --sample="${SAMPLE_NAME}" \
    --localcores=8 \
    --localmem=64

# If "aggregation of multiple sequencing runs" refers to combining outputs from multiple *samples*,
# you would use cellranger aggr after running cellranger count for each sample.
# Example for cellranger aggr:
# # Create an aggregation CSV file (e.g., aggregation.csv) with sample_id and molecule_h5 paths
# # sample_id,molecule_h5
# # sample1,/path/to/sample1_output/outs/molecule_info.h5
# # sample2,/path/to/sample2_output/outs/molecule_info.h5
# # cellranger aggr --id="aggregated_output" --csv="aggregation.csv" --normalize=none

Data normalization, dimension reduction, and differential expression analysis were performed using the Seurat R package (v2.2.0)

Seurat v2.2.0 GitHub

$ Bash example

# Install R if not already present
# sudo apt-get update
# sudo apt-get install -y r-base

# Install devtools if not already present
# R -e 'install.packages("devtools")'

# Install Seurat v2.2.0 using devtools::install_version
# R -e 'devtools::install_version("Seurat", version = "2.2.0", repos = "http://cran.us.r-project.org")'

# This script assumes Seurat v2.2.0 is installed and R is available.
# It demonstrates the typical steps for normalization, dimension reduction, and differential expression.
# Replace the dummy data loading with your actual input data (e.g., a count matrix).

Rscript -e '
library(Seurat)

# --- 1. Load Data ---
# In a real scenario, you would load your single-cell RNA-seq count matrix.
# Example: pbmc.data <- Read10X(data.dir = "path/to/10x/data/")
# For demonstration, we create a small dummy count matrix.
set.seed(123)
dummy_counts <- matrix(sample(0:100, 1000, replace = TRUE), nrow = 100, ncol = 10)
rownames(dummy_counts) <- paste0("gene", 1:100)
colnames(dummy_counts) <- paste0("cell", 1:10)
pbmc <- CreateSeuratObject(raw.data = dummy_counts)

# --- 2. Normalization ---
# Normalizes the raw expression data. LogNormalize is the default for Seurat v2.
pbmc <- NormalizeData(object = pbmc, normalization.method = "LogNormalize", scale.factor = 10000)

# --- 3. Find Variable Genes (Optional but common before dimension reduction) ---
# Identifies genes that are highly variable across cells.
pbmc <- FindVariableGenes(object = pbmc, mean.function = ExpMean, dispersion.function = LogVMR, 
                          x.low.cutoff = 0.0125, x.high.cutoff = 3, y.cutoff = 0.5)

# --- 4. Scale Data ---
# Scales and centers the data. This is crucial before performing PCA.
# Common covariates like "nUMI" (number of unique molecular identifiers) can be regressed out.
pbmc <- ScaleData(object = pbmc, vars.to.regress = c("nUMI"))

# --- 5. Dimension Reduction (PCA) ---
# Performs Principal Component Analysis on the scaled data.
pbmc <- RunPCA(object = pbmc, pc.genes = pbmc@var.genes, do.print = FALSE, pcs.print = 1:5, genes.print = 5)

# --- 6. Cluster Cells (Optional but common after dimension reduction) ---
# Finds clusters of cells based on their PCA embeddings.
pbmc <- FindClusters(object = pbmc, reduction.type = "pca", dims.use = 1:10, resolution = 0.6, print.output = 0, save.SNN = TRUE)

# --- 7. Differential Expression Analysis ---
# Identifies genes that are differentially expressed between cell clusters.
# Example: Find markers for cluster 0 compared to all other cells.
# Replace "0" with the actual cluster ID you want to analyze.
cluster_0_markers <- FindMarkers(object = pbmc, ident.1 = 0, min.pct = 0.25)

# To find markers for all clusters, uncomment the following:
# all_markers <- FindAllMarkers(object = pbmc, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)

# --- 8. Save Results (Optional) ---
# Save the processed Seurat object and/or differential expression results.
# saveRDS(pbmc, file = "processed_seurat_object.rds")
# write.csv(cluster_0_markers, "cluster_0_markers.csv")
# write.csv(all_markers, "all_cluster_markers.csv")
'

View on GitHub

Single cell signatures scoring was performed using the FastProject software (v1.1.0)

FastProject v1.1.0 GitHub

$ Bash example

# Install FastProject (if not already installed)
# pip install fastproject

# Example command for single cell signatures scoring with FastProject v1.1.0
# This is a generic example as specific parameters (input files, output directory, etc.) are not provided in the description.
# Replace 'input_expression_matrix.h5ad' with your actual single-cell expression data (e.g., anndata object).
# Replace 'gene_signatures.gmt' with your actual gene signature file (e.g., in GMT format).
# Replace 'output_directory' with your desired output location.
fastproject \
  --expression_matrix input_expression_matrix.h5ad \
  --signatures gene_signatures.gmt \
  --output_dir output_directory

View on GitHub

Raw Source Text

Sample demultiplexing, barcode processing, alignment, filtering, UMI counting, and aggregation of multiple sequencing runs were performed using the Cell Ranger analysis pipeline (v1.2)
Data normalization, dimension reduction, and differential expression analysis were performed using the Seurat R package (v2.2.0)
Single cell signatures scoring was performed using the FastProject software (v1.1.0)
Genome_build: mm10
Supplementary_files_format_and_content: matrix.mtx file is count matrix in sparse matrix format; barcodes.tsv is list of barcodes for each sample; genes.tsv is list of genes detected in samples

← Back to Analysis