GSE213470 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
Early transcriptional and epigenetic divergence of CD8+ T cells responding to acute versus chronic infection.PLoS biology (2023) — PMID 36716323
Dataset
GSE213470Gene expression profiles at the single cell level of CD8+ T cells from the spleens of LCMV-Armstrong or LCMV-Clone 13 infected mice
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 2.1.1)
Cell Ranger v2.1.1$ Bash example
# Download and install Cell Ranger from 10x Genomics website (version 2.1.1 or compatible) # https://www.10xgenomics.com/support/software/cell-ranger/downloads # Define variables for clarity OUTPUT_ID="my_cellranger_run" TRANSCRIPTOME_REF="/path/to/refdata-gex-GRCh38-2020-A" # Placeholder: Use a Cell Ranger-compatible transcriptome reference (e.g., human GRCh38) FASTQ_DIR="/path/to/your/fastq_files" # Directory containing your FASTQ files SAMPLE_NAME="my_sample" # Name of your sample (matches FASTQ file prefix, e.g., my_sample_S1_L001_R1_001.fastq.gz) EXPECTED_CELLS=3000 # Optional: Adjust based on your experiment's expected cell count # Execute cellranger count cellranger count \ --id="${OUTPUT_ID}" \ --transcriptome="${TRANSCRIPTOME_REF}" \ --fastqs="${FASTQ_DIR}" \ --sample="${SAMPLE_NAME}" \ --expect-cells="${EXPECTED_CELLS}" \ --localcores=8 \ --localmem=64 -
2
Cells and genes filtering, dimension reduction: Scanpy (version 1.7.2)
$ Bash example
# Install Scanpy and its dependencies (if not already installed) # conda create -n scanpy_env python=3.8 # conda activate scanpy_env # conda install -c conda-forge scanpy python-igraph leidenalg # Create a Python script for Scanpy operations cat << 'EOF' > run_scanpy_processing.py import scanpy as sc import numpy as np import pandas as pd # Set verbosity for scanpy sc.settings.verbosity = 3 # errors (0), warnings (1), info (2), hints (3) sc.logging.print_header() sc.settings.set_figure_params(dpi=80, facecolor='white') # --- Configuration --- # INPUT_H5AD = "input_raw_counts.h5ad" # Replace with your actual input file OUTPUT_H5AD = "output_processed.h5ad" # --- Load Data --- try: adata = sc.read_h5ad(INPUT_H5AD) except FileNotFoundError: print(f"Error: Input file {INPUT_H5AD} not found. Creating a dummy AnnData object for demonstration.") # Create a dummy AnnData object for demonstration purposes n_cells = 1000 n_genes = 2000 counts = np.random.randint(0, 100, size=(n_cells, n_genes)) gene_names = [f'gene_{i}' for i in range(n_genes)] cell_names = [f'cell_{i}' for i in range(n_cells)] adata = sc.AnnData(counts, obs=pd.DataFrame(index=cell_names), var=pd.DataFrame(index=gene_names)) adata.var['mt'] = adata.var_names.str.startswith('MT-') # Dummy mitochondrial genes print(f"Initial AnnData object: {adata}") # --- Quality Control and Filtering --- # Calculate QC metrics sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True) # Filter cells based on number of genes and counts # Example thresholds: min_genes=200, max_genes=2500, max_counts=10000, max_mt_percent=5 print("Filtering cells...") initial_cells = adata.n_obs sc.pp.filter_cells(adata, min_genes=200) adata = adata[adata.obs.n_genes_by_counts < 2500, :] adata = adata[adata.obs.total_counts < 10000, :] adata = adata[adata.obs.pct_counts_mt < 5, :] print(f"Filtered {initial_cells - adata.n_obs} cells. Remaining cells: {adata.n_obs}") # Filter genes based on number of cells print("Filtering genes...") initial_genes = adata.n_vars sc.pp.filter_genes(adata, min_cells=3) print(f"Filtered {initial_genes - adata.n_vars} genes. Remaining genes: {adata.n_vars}") # --- Normalization and Log-transformation --- print("Normalizing and log-transforming data...") sc.pp.normalize_total(adata, target_sum=1e4) # Normalize each cell's total counts to 10,000 sc.pp.log1p(adata) # Log-transform the data # --- Feature Selection (Highly Variable Genes) --- print("Identifying highly variable genes...") sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5) # Subset to highly variable genes for dimension reduction adata = adata[:, adata.var.highly_variable] print(f"Number of highly variable genes: {adata.n_vars}") # --- Scaling --- print("Scaling data...") sc.pp.scale(adata, max_value=10) # Scale data to unit variance, clip values above 10 # --- Dimension Reduction --- print("Performing PCA...") sc.tl.pca(adata, svd_solver='arpack') print("Computing neighbors graph...") sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40) # Use 40 principal components for neighbor graph print("Performing UMAP...") sc.tl.umap(adata) # Uniform Manifold Approximation and Projection # --- Save Processed Data --- print(f"Saving processed AnnData object to {OUTPUT_H5AD}") adata.write(OUTPUT_H5AD) print("Processing complete.") EOF # Execute the Python script python run_scanpy_processing.py
Tools Used
Raw Source Text
Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 2.1.1) Cells and genes filtering, dimension reduction: Scanpy (version 1.7.2) Assembly: mm10 Supplementary files format and content: Tab-separated values files and matrix files