GSE201898 Processing Pipeline

GSE code_examples 3 steps

Publication

Science translational medicine (2022) — PMID 35767654

Dataset

GSE201898

MECP2-related pathways are dysregulated in a cortical organoid model of Myotonic dystrophy

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)

Cell Ranger v3.0.2

$ Bash example

cellranger_version="3.0.2"

# Cell Ranger is typically installed by downloading the tarball from 10x Genomics and adding it to your PATH.
# Example (adjust path as needed):
# wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-3.0.2.tar.gz
# tar -xzf cellranger-3.0.2.tar.gz
# export PATH=/path/to/cellranger-3.0.2:$PATH

# Define variables for the run
SAMPLE_ID="my_sample_id" # A unique ID for this run
FASTQ_DIR="/path/to/your/fastqs" # Directory containing FASTQ files (e.g., from bcl2fastq or mkfastq)
SAMPLE_NAME="sample_1" # The sample name prefix for your FASTQ files (e.g., sample_1_S1_L001_R1_001.fastq.gz)
TRANSCRIPTOME_REF="/path/to/refdata-gex-GRCh38-2020-A" # Path to a Cell Ranger-compatible transcriptome reference (e.g., from 10x Genomics)

# Execute cellranger count
cellranger count \
    --id=${SAMPLE_ID} \
    --transcriptome=${TRANSCRIPTOME_REF} \
    --fastqs=${FASTQ_DIR} \
    --sample=${SAMPLE_NAME} \
    --expect-cells=3000 # Optional: Expected number of cells, adjust as needed

MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam

Cell Ranger v3.0.0 GitHub

$ Bash example

# Install samtools if not already available
# conda install -c bioconda samtools

# Add MD tags to alignments
samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam

View on GitHub

Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.

fgbio (Inferred with models/gemini-2.5-flash) v2.3.0 GitHub

$ Bash example

# Install fgbio if not already installed
# conda install -c bioconda fgbio

# Define input BAM file (replace with actual input file)
INPUT_BAM="input.bam"

# Define output prefix for split BAM files
OUTPUT_PREFIX="barcode_split_"

# Split BAM file based on the CB:Z tag, creating one BAM file per unique barcode.
# The --tag CB option specifies the tag to split by (CB:Z refers to the CB tag with Z string type).
# The --strategy BARCODE option ensures splitting by unique barcode values found in the tag.
fgbio SplitBamByTag --input "${INPUT_BAM}" --output-prefix "${OUTPUT_PREFIX}" --tag CB --strategy BARCODE

View on GitHub

Raw Source Text

Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)
MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam
Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.
Assembly: hg19
Supplementary files format and content: Tab-separated values files and matrix files

← Back to Analysis