GSE131847 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
Heterogenous Populations of Tissue-Resident CD8<sup>+</sup> T Cells Are Generated in Response to Infection and Malignancy.Immunity (2020) — PMID 32433949
Dataset
GSE131847Molecular determinants and heterogeneity of circulating and tissue-resident memory CD8+ T lymphocytes revealed by single-cell RNA sequencing
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference.
Cell Ranger v2.1.0$ Bash example
# Cell Ranger is a proprietary software from 10x Genomics. # Download and installation instructions are available on the 10x Genomics website: # https://www.10xgenomics.com/support/software/cell-ranger/downloads # Ensure cellranger 2.1.0 is in your PATH. # Download pre-built mm10 reference from 10x Genomics (or build your own using cellranger mkref): # https://www.10xgenomics.com/support/software/cell-ranger/downloads/latest REFERENCE_PATH="/path/to/cellranger_ref/mm10" # Define input and output paths SAMPLE_ID="my_sample" FASTQS_DIR="/path/to/fastq_files" OUTPUT_DIR="/path/to/output_directory" # Execute cellranger count with default parameters # The 'default parameter' implies using the standard settings for cellranger count. # You will need to replace SAMPLE_ID, FASTQS_DIR, OUTPUT_DIR, and REFERENCE_PATH with actual values. cellranger count \ --id="${SAMPLE_ID}" \ --transcriptome="${REFERENCE_PATH}" \ --fastqs="${FASTQS_DIR}" \ --sample="${SAMPLE_ID}" \ --localcores=8 \ --localmem=64 -
2
Raw cell-reads were then loaded to R using the cellrangerRkit package.
R vInfer from description (for R and cellrangerRkit package) (Inferred with models/gemini-2.5-flash) GitHub$ Bash example
# Install R if not already installed (e.g., using conda) # conda create -n r_env r-base -y # conda activate r_env # Install the 'cellrangerRkit' package. # Note: 'cellrangerRkit' does not appear to be a standard CRAN or Bioconductor package. # If it's a custom package, you might need to install it from a specific source (e.g., GitHub). # Example for a hypothetical GitHub installation: # R -e 'install.packages("devtools")' # R -e 'devtools::install_github("your_organization/cellrangerRkit")' # Replace with actual repo if known # Create an R script to load the Cell Ranger output cat << 'EOF' > load_cellranger_data.R # Load the cellrangerRkit package library(cellrangerRkit) # Define the path to the Cell Ranger output directory from an environment variable # This directory typically contains 'matrix.mtx', 'barcodes.tsv', 'features.tsv' cellranger_output_dir <- Sys.getenv("CELLRANGER_OUTPUT_DIR") if (cellranger_output_dir == "") { stop("CELLRANGER_OUTPUT_DIR environment variable is not set. Please provide the path to the Cell Ranger output directory.") } message(paste("Loading raw cell-reads data from:", cellranger_output_dir)) # Load the data using a function from cellrangerRkit. # The exact function name might vary (e.g., read_cellranger_matrix, load_cellranger_data). # Assuming 'read_cellranger_matrix' is a plausible function to load the feature-barcode matrix. cell_reads_data <- read_cellranger_matrix(cellranger_output_dir) # Further processing or saving the loaded data can be added here # For example, converting to a Seurat object, or saving to an RData file # saveRDS(cell_reads_data, file = "loaded_cell_reads.rds") message("Raw cell-reads data loaded successfully into R.") # You can inspect the loaded data, e.g., # print(cell_reads_data) EOF # Set the environment variable for the Cell Ranger output directory # Replace 'path/to/your/cellranger/output/filtered_feature_bc_matrix' with the actual path export CELLRANGER_OUTPUT_DIR="path/to/your/cellranger/output/filtered_feature_bc_matrix" # Execute the R script Rscript load_cellranger_data.R -
3
The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts total counts ratio.
$ Bash example
cat << 'EOF' > filter_scrnaseq.R library(Seurat) # Load your Seurat object (replace 'input_seurat_object.rds' with your actual input path) # This script assumes you have a Seurat object saved as an RDS file. # If starting from raw counts (e.g., 10x Genomics output), you would first create the Seurat object: # pbmc.data <- Read10X(data.dir = "path/to/10x/data/") # seurat_obj <- CreateSeuratObject(counts = pbmc.data, project = "scRNAseq_analysis") seurat_obj <- readRDS("input_seurat_object.rds") # Calculate mitochondrial percentage # The pattern for mitochondrial genes depends on the species and annotation. # For human, it's typically "^MT-". For mouse, it's often "^mt-". Adjust as needed. seurat_obj[["percent.mt"]] <- PercentageFeatureSet(seurat_obj, pattern = "^MT-") # Filter cells based on gene numbers (nFeature_RNA) and mitochondrial gene counts ratio (percent.mt) # The thresholds below are examples. Optimal thresholds should be determined by inspecting # violin plots or feature scatter plots of nFeature_RNA, nCount_RNA, and percent.mt # for your specific dataset to identify outliers and low-quality cells. # Example thresholds: # - nFeature_RNA: Number of unique genes detected per cell. Filter for cells with > 200 and < 2500 genes. # - percent.mt: Percentage of mitochondrial reads. Filter for cells with < 5% mitochondrial reads. seurat_obj_filtered <- subset(seurat_obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) # Save the filtered Seurat object saveRDS(seurat_obj_filtered, "output_seurat_object_filtered.rds") # Optional: Print summary of filtering results cat("Original number of cells: ", ncol(seurat_obj), "\n") cat("Filtered number of cells: ", ncol(seurat_obj_filtered), "\n") EOF # Install Seurat (if not already installed) # R -q -e "install.packages('Seurat')" # R -q -e "install.packages('SeuratObject')" # Execute the R script Rscript filter_scrnaseq.R -
4
Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis.
$ Bash example
# Install Scanpy if not already installed # pip install scanpy # Create a Python script to perform cell filtering based on QC metrics cat << 'EOF' > filter_cells_qc.py import scanpy as sc import sys # Define input and output file paths input_h5ad = sys.argv[1] output_h5ad = sys.argv[2] # Load the AnnData object adata = sc.read_h5ad(input_h5ad) print(f"Initial number of cells: {adata.n_obs}") # Apply filtering criteria: # 1. Number of genes detected (n_genes_by_counts) > 400 # 2. Total UMIs (total_counts) > 0 # 3. Percentage of mitochondrial UMIs (pct_counts_mt) between 0.5% and 20% (exclusive of 20% in this implementation, adjust if inclusive is strictly needed) # Note: It's assumed that 'n_genes_by_counts', 'total_counts', and 'pct_counts_mt' # have been pre-calculated and stored in adata.obs, typically via sc.pp.calculate_qc_metrics. filtered_adata = adata[ (adata.obs['n_genes_by_counts'] > 400) & (adata.obs['total_counts'] > 0) & (adata.obs['pct_counts_mt'] > 0.5) & (adata.obs['pct_counts_mt'] < 20.0) ].copy() # Use .copy() to ensure a new AnnData object is created print(f"Number of cells after filtering: {filtered_adata.n_obs}") # Save the filtered AnnData object filtered_adata.write(output_h5ad) EOF # Execute the Python script with placeholder input and output files # Replace 'input_raw_cells.h5ad' with your actual input AnnData file # Replace 'output_filtered_cells.h5ad' with your desired output file name python filter_cells_qc.py input_raw_cells.h5ad output_filtered_cells.h5ad -
5
For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI.
$ Bash example
# Install UMI-tools if not already installed # conda install -c bioconda umi-tools # Define input and output file names R1_IN="R1.fastq.gz" R2_IN="R2.fastq.gz" R1_OUT="R1_extracted.fastq.gz" R2_OUT="R2_extracted.fastq.gz" # Define the barcode and UMI pattern for R1 # C{16} for 16 bp Cell Barcode, U{10} for 10 bp UMI # The pattern CCCCCCCCCCCCCCCCUUUUUUUUUU means the CB is the first 16bp, # followed immediately by the UMI for the next 10bp in R1. UMI_PATTERN="CCCCCCCCCCCCCCCCUUUUUUUUUU" # Extract Cell Barcode and UMI from R1 and add to read headers of both R1 and R2 umi_tools extract \ --pattern "${UMI_PATTERN}" \ --read1-in "${R1_IN}" \ --read2-in "${R2_IN}" \ --read1-out "${R1_OUT}" \ --read2-out "${R2_OUT}"
Raw Source Text
data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference. Raw cell-reads were then loaded to R using the cellrangerRkit package. The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts total counts ratio. Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis. For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI. Genome_build: mm10 Supplementary_files_format_and_content: cell-gene UMI table: UMI table with each row represent a cell and each column represent a gene.