GSE147081 Processing Pipeline
RNA-Seq
code_examples
5 steps
Publication
Heterogenous Populations of Tissue-Resident CD8<sup>+</sup> T Cells Are Generated in Response to Infection and Malignancy.Immunity (2020) — PMID 32433949
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference.
Cell Ranger v2.1.0$ Bash example
# Install Cell Ranger (example, adjust for specific system and version) # Note: Cell Ranger 2.1.0 is an older version. Ensure compatibility with your system and reference. # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-2.1.0.tar.gz # tar -xzf cellranger-2.1.0.tar.gz # export PATH=/path/to/cellranger-2.1.0:$PATH # Download or build mm10 reference compatible with Cell Ranger 2.1.0 (example, adjust path) # 10x Genomics typically provides pre-built references. For 2.1.0, this might be refdata-cellranger-mm10-1.2.0 or similar. # Example placeholder for reference path: # REF_PATH="/path/to/cellranger_ref/mm10_compatible_with_2.1.0" # If building, ensure the GTF and FASTA files are appropriate for mm10 and the Cell Ranger version. # cellranger mkref --genome=mm10 --fasta=genome.fa --genes=genes.gtf # Define input and output paths SAMPLE_ID="my_cellranger_run" FASTQ_DIR="/path/to/your/fastq_files" # Directory containing FASTQ files (e.g., from mkfastq output) REFERENCE_PATH="/path/to/cellranger_ref/mm10_2.1.0_compatible" # Path to the Cell Ranger compatible mm10 reference transcriptome # Run cellranger count with default parameters # The description states "default parameter", so we use the standard 'count' command # with essential inputs. Other parameters like --expect-cells, --chemistry, etc., # would use their default values for Cell Ranger 2.1.0. cellranger count \ --id="${SAMPLE_ID}" \ --transcriptome="${REFERENCE_PATH}" \ --fastqs="${FASTQ_DIR}" \ --sample="your_sample_name" # Replace with actual sample name(s) as found in FASTQ file names -
2
Raw cell-reads were then loaded to R using the cellrangerRkit package.
R vNot specified in description (Inferred with models/gemini-2.5-flash)$ Bash example
# Install cellrangerRkit package (if not already installed) # R -e 'install.packages("cellrangerRkit")' # Create a placeholder R script to load Cell Ranger data cat <<EOF > load_cell_reads.R library(cellrangerRkit) # Define the path to the Cell Ranger output directory # This directory should contain files like matrix.mtx, barcodes.tsv, features.tsv # Replace 'path/to/cellranger_output_directory' with the actual path to your Cell Ranger output cellranger_output_dir <- "path/to/cellranger_output_directory" # Load the raw cell-reads matrix # The function load_cellranger_matrix reads the sparse matrix, barcodes, and features cell_reads_data <- load_cellranger_matrix(cellranger_output_dir) # You can now work with 'cell_reads_data' object, e.g., inspect dimensions, save it # For example, to save the loaded data for further R analysis: saveRDS(cell_reads_data, "loaded_cell_reads.rds") message(paste("Successfully loaded data from:", cellranger_output_dir)) message(paste("Dimensions of the loaded matrix:", paste(dim(cell_reads_data@exprs), collapse = " x "))) EOF # Execute the R script Rscript load_cell_reads.R -
3
The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts tototal counts ratio.
$ Bash example
# Install Seurat if not already installed # R -e 'install.packages("Seurat")' # R -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("Seurat")' # Example R script for filtering scRNA-seq data using Seurat Rscript -e ' library(Seurat) # Load your Seurat object (replace with your actual input file) # Assuming the input is a Seurat object saved as an .rds file seurat_object <- readRDS("input_seurat_object.rds") # Calculate mitochondrial percentage (if not already present) # This assumes mitochondrial genes are prefixed with "MT-" for human or "mt-" for mouse. # Adjust the pattern based on your organism. seurat_object[["percent.mt"]] <- PercentageFeatureSet(seurat_object, pattern = "^MT-") # Filter cells based on: # 1. Number of genes detected (nFeature_RNA): typically > 200 and < 2500-5000 # 2. Mitochondrial percentage (percent.mt): typically < 5-10% # These thresholds are common starting points and may need adjustment based on data quality. # The description implies filtering based on "gene numbers" (nFeature_RNA) and "mitochondria gene counts to total counts ratio" (percent.mt). filtered_seurat_object <- subset(seurat_object, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) # Save the filtered Seurat object saveRDS(filtered_seurat_object, file = "filtered_seurat_object.rds") print(paste("Original number of cells:", ncol(seurat_object))) print(paste("Filtered number of cells:", ncol(filtered_seurat_object))) ' -
4
Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis.
$ Bash example
#!/bin/bash # Define input and output files INPUT_H5AD="raw_adata.h5ad" # Placeholder for input AnnData object (e.g., from cellranger aggr or similar) OUTPUT_H5AD="filtered_adata.h5ad" # Placeholder for output AnnData object MT_GENE_PREFIX="MT-" # Common prefix for mitochondrial genes (e.g., "MT-" for human GRCh38, "mt-" for mouse GRCm38) # Check if input file exists if [ ! -f "$INPUT_H5AD" ]; then echo "Error: Input file $INPUT_H5AD not found." >&2 exit 1 fi # Install scanpy if not already installed (commented out) # conda create -n scanpy_env python=3.9 -y # conda activate scanpy_env # pip install scanpy anndata pandas numpy # Execute Python script for filtering python -c " import scanpy as sc import pandas as pd import numpy as np import sys input_h5ad = \"$INPUT_H5AD\" output_h5ad = \"$OUTPUT_H5AD\" mt_gene_prefix = \"$MT_GENE_PREFIX\" try: adata = sc.read_h5ad(input_h5ad) except FileNotFoundError: print(f\"Error: Input file {input_h5ad} not found.\", file=sys.stderr) sys.exit(1) except Exception as e: print(f\"Error loading AnnData object: {e}\", file=sys.stderr) sys.exit(1) print(f\"Initial number of cells: {adata.n_obs}\") # Identify mitochondrial genes based on prefix in adata.var_names # This step is crucial and depends on the reference genome annotation used during quantification. # For human GRCh38, 'MT-' is a common prefix. Adjust 'mt_gene_prefix' if needed for other species/annotations. adata.var['mt'] = adata.var_names.str.startswith(mt_gene_prefix) # Calculate QC metrics including mitochondrial percentage if not already present # 'qc_vars=['mt']' ensures 'pct_counts_mt' is calculated based on the 'mt' column in adata.var # 'n_genes_by_counts' and 'total_counts' (UMIs) are also calculated by default. sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True) # Apply filtering criteria: # 1. Cells with > 400 genes detected (n_genes_by_counts) # 2. Cells with > 0 UMIs (total_counts) # 3. Cells with 0.5% ~ 20% of UMIs mapping to mitochondrial genes (pct_counts_mt) # Filter by number of genes adata = adata[adata.obs['n_genes_by_counts'] > 400, :].copy() # Filter by total UMIs (total_counts) adata = adata[adata.obs['total_counts'] > 0, :].copy() # Filter by mitochondrial percentage adata = adata[adata.obs['pct_counts_mt'] > 0.5, :].copy() adata = adata[adata.obs['pct_counts_mt'] < 20.0, :].copy() print(f\"Number of cells after filtering: {adata.n_obs}\") try: adata.write(output_h5ad) except Exception as e: print(f\"Error saving filtered AnnData object: {e}\", file=sys.stderr) sys.exit(1) print(f\"Filtered data saved to {output_h5ad}\") " -
5
For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI.
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools # Example usage: Replace R1.fastq.gz and R2.fastq.gz with actual input files # and R1_extracted.fastq.gz, R2_extracted.fastq.gz with desired output files. # The cell barcode (C{16}) and UMI (N{10}) are extracted from R1. umi_tools extract \ --bc-pattern=C{16}N{10} \ --extract-method=regex \ --stdin=R1.fastq.gz \ --stdout=R1_extracted.fastq.gz \ --read2-in=R2.fastq.gz \ --read2-out=R2_extracted.fastq.gz \ --log=umi_extract.log
Raw Source Text
data were processed by cellranger 2.1.0 with default parameter, using mm10 as reference. Raw cell-reads were then loaded to R using the cellrangerRkit package. The scRNA-seq dataset was then further filtered based on gene numbers and mitochondria gene counts tototal counts ratio. Only cells with > 400 genes,UMI > 0,and 0.5% ~ 20% of their UMIs mappingto mitochondria genes were kept for downstream analysis. For the scRNA-seq, the first 16 bp of R1 is the cell barcode and the next 10bp (17-26bp) is the UMI. Genome_build: mm10 Supplementary_files_format_and_content: CellRanger output of filtered gene matrix with barcodes & features files