GSE201891 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

Science translational medicine (2022) — PMID 35767654

Dataset

GSE201891

MECP2-related pathways are dysregulated in a cortical organoid model of Myotonic dystrophy [scRNA-Seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)

Cell Ranger v3.0.2

$ Bash example

# Cell Ranger 3.0.2 is typically installed by downloading the tarball and extracting it.
# Example:
# wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-3.0.2.tar.gz
# tar -xzf cellranger-3.0.2.tar.gz
# export PATH=/path/to/cellranger-3.0.2:$PATH

# Download a reference transcriptome (e.g., GRCh38) if not already available.
# Example:
# wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
# tar -xzf refdata-gex-GRCh38-2020-A.tar.gz
# mv refdata-gex-GRCh38-2020-A /path/to/your/references/

# Define variables (replace with actual paths and sample ID)
SAMPLE_ID="my_sample_id"
FASTQ_DIR="/path/to/your/fastqs" # Directory containing FASTQ files for the sample
TRANSCRIPTOME_REF="/path/to/your/references/refdata-gex-GRCh38-2020-A" # Example GRCh38 reference

# Execute cellranger count for alignment, cell barcode processing, UMI processing, and abundance measurements
cellranger count \
    --id="${SAMPLE_ID}" \
    --transcriptome="${TRANSCRIPTOME_REF}" \
    --fastqs="${FASTQ_DIR}" \
    --sample="${SAMPLE_ID}" # Use --sample if multiple samples are in FASTQ_DIR and you want to process a specific one

MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam

Cell Ranger v3.0.0 GitHub

$ Bash example

# Install samtools (e.g., using conda)
# conda install -c bioconda samtools=1.9

# Define input and output files
INPUT_BAM="possorted_genome_bam.bam"
OUTPUT_BAM="possorted_genome_bam_MD.bam"
REFERENCE_FASTA="refdata-cellranger-hg19-3.0.0/fasta/genome.fa"

# Add MD tags to alignments
samtools calmd --threads 15 -rb "${INPUT_BAM}" "${REFERENCE_FASTA}" > "${OUTPUT_BAM}"

View on GitHub

Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.

samtools (Inferred with models/gemini-2.5-flash) v1.19.1 GitHub

$ Bash example

# Install samtools if not already installed
# conda install -c bioconda samtools

# Define input BAM file and output directory
INPUT_BAM="input.bam"
OUTPUT_DIR="split_bams"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Extract unique CB:Z tags from the input BAM file
# This command pipes the SAM output, extracts the CB:Z tag using grep,
# cuts the barcode value, sorts it, and gets unique entries.
samtools view "${INPUT_BAM}" | \
    grep -o 'CB:Z:[^[:space:]]*' | \
    cut -d':' -f3 | \
    sort -u > unique_barcodes.list

# Loop through each unique barcode and split the BAM file
while IFS= read -r BARCODE; do
    if [ -n "$BARCODE" ]; then # Ensure the barcode is not empty
        echo "Splitting reads for barcode: ${BARCODE}"
        # Filter the BAM file for reads containing the specific CB:Z tag
        # and write them to a new BAM file.
        # samtools view -h: output header and SAM format
        # grep -P "\tCB:Z:${BARCODE}\t": filter for lines containing the exact tag
        # samtools view -bS -: convert filtered SAM stream back to BAM
        samtools view -h "${INPUT_BAM}" | \
            grep -P "\tCB:Z:${BARCODE}\t" | \
            samtools view -bS - > "${OUTPUT_DIR}/${BARCODE}.bam"
        
        # Index the newly created BAM file (optional, but good practice for downstream tools)
        samtools index "${OUTPUT_DIR}/${BARCODE}.bam"
    fi
done < unique_barcodes.list

# Clean up the temporary file containing unique barcodes
rm unique_barcodes.list

View on GitHub

Raw Source Text

Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)
MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam
Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.
Assembly: hg19
Supplementary files format and content: Tab-separated values files and matrix files

← Back to Analysis