GSE131847 Processing Pipeline

RNA-Seq code_examples 5 steps

Publication

Heterogenous Populations of Tissue-Resident CD8<sup>+</sup> T Cells Are Generated in Response to Infection and Malignancy.

Immunity (2020) — PMID 32433949

Dataset

GSE131847

Molecular determinants and heterogeneity of circulating and tissue-resident memory CD8+ T lymphocytes revealed by single-cell RNA sequencing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference.

Cell Ranger v2.1.0

$ Bash example

# Cell Ranger is a proprietary software from 10x Genomics.
# Download and installation instructions are available on the 10x Genomics website:
# https://www.10xgenomics.com/support/software/cell-ranger/downloads
# Ensure cellranger 2.1.0 is in your PATH.

# Download pre-built mm10 reference from 10x Genomics (or build your own using cellranger mkref):
# https://www.10xgenomics.com/support/software/cell-ranger/downloads/latest
REFERENCE_PATH="/path/to/cellranger_ref/mm10"

# Define input and output paths
SAMPLE_ID="my_sample"
FASTQS_DIR="/path/to/fastq_files"
OUTPUT_DIR="/path/to/output_directory"

# Execute cellranger count with default parameters
# The 'default parameter' implies using the standard settings for cellranger count.
# You will need to replace SAMPLE_ID, FASTQS_DIR, OUTPUT_DIR, and REFERENCE_PATH with actual values.
cellranger count \
    --id="${SAMPLE_ID}" \
    --transcriptome="${REFERENCE_PATH}" \
    --fastqs="${FASTQS_DIR}" \
    --sample="${SAMPLE_ID}" \
    --localcores=8 \
    --localmem=64

Raw cell-reads were then loaded to R using the cellrangerRkit package.

R vInfer from description (for R and cellrangerRkit package) (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install R if not already installed (e.g., using conda)
# conda create -n r_env r-base -y
# conda activate r_env

# Install the 'cellrangerRkit' package.
# Note: 'cellrangerRkit' does not appear to be a standard CRAN or Bioconductor package.
# If it's a custom package, you might need to install it from a specific source (e.g., GitHub).
# Example for a hypothetical GitHub installation:
# R -e 'install.packages("devtools")'
# R -e 'devtools::install_github("your_organization/cellrangerRkit")' # Replace with actual repo if known

# Create an R script to load the Cell Ranger output
cat << 'EOF' > load_cellranger_data.R
# Load the cellrangerRkit package
library(cellrangerRkit)

# Define the path to the Cell Ranger output directory from an environment variable
# This directory typically contains 'matrix.mtx', 'barcodes.tsv', 'features.tsv'
cellranger_output_dir <- Sys.getenv("CELLRANGER_OUTPUT_DIR")

if (cellranger_output_dir == "") {
    stop("CELLRANGER_OUTPUT_DIR environment variable is not set. Please provide the path to the Cell Ranger output directory.")
}

message(paste("Loading raw cell-reads data from:", cellranger_output_dir))

# Load the data using a function from cellrangerRkit.
# The exact function name might vary (e.g., read_cellranger_matrix, load_cellranger_data).
# Assuming 'read_cellranger_matrix' is a plausible function to load the feature-barcode matrix.
cell_reads_data <- read_cellranger_matrix(cellranger_output_dir)

# Further processing or saving the loaded data can be added here
# For example, converting to a Seurat object, or saving to an RData file
# saveRDS(cell_reads_data, file = "loaded_cell_reads.rds")

message("Raw cell-reads data loaded successfully into R.")
# You can inspect the loaded data, e.g.,
# print(cell_reads_data)
EOF

# Set the environment variable for the Cell Ranger output directory
# Replace 'path/to/your/cellranger/output/filtered_feature_bc_matrix' with the actual path
export CELLRANGER_OUTPUT_DIR="path/to/your/cellranger/output/filtered_feature_bc_matrix"

# Execute the R script
Rscript load_cellranger_data.R

View on GitHub

The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts total counts ratio.

scRNA-seq v5.0.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

cat << 'EOF' > filter_scrnaseq.R
library(Seurat)

# Load your Seurat object (replace 'input_seurat_object.rds' with your actual input path)
# This script assumes you have a Seurat object saved as an RDS file.
# If starting from raw counts (e.g., 10x Genomics output), you would first create the Seurat object:
# pbmc.data <- Read10X(data.dir = "path/to/10x/data/")
# seurat_obj <- CreateSeuratObject(counts = pbmc.data, project = "scRNAseq_analysis")
seurat_obj <- readRDS("input_seurat_object.rds")

# Calculate mitochondrial percentage
# The pattern for mitochondrial genes depends on the species and annotation.
# For human, it's typically "^MT-". For mouse, it's often "^mt-". Adjust as needed.
seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-")

# Filter cells based on gene numbers (nFeature_RNA) and mitochondrial gene counts ratio (percent.mt)
# The thresholds below are examples. Optimal thresholds should be determined by inspecting
# violin plots or feature scatter plots of nFeature_RNA, nCount_RNA, and percent.mt
# for your specific dataset to identify outliers and low-quality cells.
# Example thresholds:
# - nFeature_RNA: Number of unique genes detected per cell. Filter for cells with > 200 and < 2500 genes.
# - percent.mt: Percentage of mitochondrial reads. Filter for cells with < 5% mitochondrial reads.
seurat_obj_filtered <- subset(seurat_obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

# Save the filtered Seurat object
saveRDS(seurat_obj_filtered, "output_seurat_object_filtered.rds")

# Optional: Print summary of filtering results
cat("Original number of cells: ", ncol(seurat_obj), "\n")
cat("Filtered number of cells: ", ncol(seurat_obj_filtered), "\n")
EOF

# Install Seurat (if not already installed)
# R -q -e "install.packages('Seurat')"
# R -q -e "install.packages('SeuratObject')"

# Execute the R script
Rscript filter_scrnaseq.R

View on GitHub

Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis.

Scanpy (Inferred with models/gemini-2.5-flash) v1.9.1 GitHub

$ Bash example

# Install Scanpy if not already installed
# pip install scanpy

# Create a Python script to perform cell filtering based on QC metrics
cat << 'EOF' > filter_cells_qc.py
import scanpy as sc
import sys

# Define input and output file paths
input_h5ad = sys.argv[1]
output_h5ad = sys.argv[2]

# Load the AnnData object
adata = sc.read_h5ad(input_h5ad)

print(f"Initial number of cells: {adata.n_obs}")

# Apply filtering criteria:
# 1. Number of genes detected (n_genes_by_counts) > 400
# 2. Total UMIs (total_counts) > 0
# 3. Percentage of mitochondrial UMIs (pct_counts_mt) between 0.5% and 20% (exclusive of 20% in this implementation, adjust if inclusive is strictly needed)
#    Note: It's assumed that 'n_genes_by_counts', 'total_counts', and 'pct_counts_mt' 
#    have been pre-calculated and stored in adata.obs, typically via sc.pp.calculate_qc_metrics.
filtered_adata = adata[
    (adata.obs['n_genes_by_counts'] > 400) &
    (adata.obs['total_counts'] > 0) &
    (adata.obs['pct_counts_mt'] > 0.5) &
    (adata.obs['pct_counts_mt'] < 20.0)
].copy() # Use .copy() to ensure a new AnnData object is created

print(f"Number of cells after filtering: {filtered_adata.n_obs}")

# Save the filtered AnnData object
filtered_adata.write(output_h5ad)
EOF

# Execute the Python script with placeholder input and output files
# Replace 'input_raw_cells.h5ad' with your actual input AnnData file
# Replace 'output_filtered_cells.h5ad' with your desired output file name
python filter_cells_qc.py input_raw_cells.h5ad output_filtered_cells.h5ad

View on GitHub

For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI.

scRNA-seq v1.1.2 GitHub

$ Bash example

# Install UMI-tools if not already installed
# conda install -c bioconda umi-tools

# Define input and output file names
R1_IN="R1.fastq.gz"
R2_IN="R2.fastq.gz"
R1_OUT="R1_extracted.fastq.gz"
R2_OUT="R2_extracted.fastq.gz"

# Define the barcode and UMI pattern for R1
# C{16} for 16 bp Cell Barcode, U{10} for 10 bp UMI
# The pattern CCCCCCCCCCCCCCCCUUUUUUUUUU means the CB is the first 16bp,
# followed immediately by the UMI for the next 10bp in R1.
UMI_PATTERN="CCCCCCCCCCCCCCCCUUUUUUUUUU"

# Extract Cell Barcode and UMI from R1 and add to read headers of both R1 and R2
umi_tools extract \
    --pattern "${UMI_PATTERN}" \
    --read1-in "${R1_IN}" \
    --read2-in "${R2_IN}" \
    --read1-out "${R1_OUT}" \
    --read2-out "${R2_OUT}"

View on GitHub

Tools Used

R scRNA-seq

Raw Source Text

data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference.
Raw cell-reads were then loaded to R using the cellrangerRkit package. The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts total counts ratio. Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis.
For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI.
Genome_build: mm10
Supplementary_files_format_and_content: cell-gene UMI table: UMI table with each row represent a cell and each column represent a gene.

← Back to Analysis