GSE201891 Processing Pipeline

RNA-Seq code_examples 3 steps

Publication

MECP2-related pathways are dysregulated in a cortical organoid model of myotonic dystrophy.

Science translational medicine (2022) — PMID 35767654

Dataset

GSE201891

MECP2-related pathways are dysregulated in a cortical organoid model of Myotonic dystrophy [scRNA-Seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)

    Cell Ranger v3.0.2
    $ Bash example
    # Cell Ranger 3.0.2 is typically installed by downloading the tarball and extracting it.
    # Example:
    # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-3.0.2.tar.gz
    # tar -xzf cellranger-3.0.2.tar.gz
    # export PATH=/path/to/cellranger-3.0.2:$PATH
    
    # Download a reference transcriptome (e.g., GRCh38) if not already available.
    # Example:
    # wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
    # tar -xzf refdata-gex-GRCh38-2020-A.tar.gz
    # mv refdata-gex-GRCh38-2020-A /path/to/your/references/
    
    # Define variables (replace with actual paths and sample ID)
    SAMPLE_ID="my_sample_id"
    FASTQ_DIR="/path/to/your/fastqs" # Directory containing FASTQ files for the sample
    TRANSCRIPTOME_REF="/path/to/your/references/refdata-gex-GRCh38-2020-A" # Example GRCh38 reference
    
    # Execute cellranger count for alignment, cell barcode processing, UMI processing, and abundance measurements
    cellranger count \
        --id="${SAMPLE_ID}" \
        --transcriptome="${TRANSCRIPTOME_REF}" \
        --fastqs="${FASTQ_DIR}" \
        --sample="${SAMPLE_ID}" # Use --sample if multiple samples are in FASTQ_DIR and you want to process a specific one
    
  2. 2

    MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam

    Cell Ranger v3.0.0 GitHub
    $ Bash example
    # Install samtools (e.g., using conda)
    # conda install -c bioconda samtools=1.9
    
    # Define input and output files
    INPUT_BAM="possorted_genome_bam.bam"
    OUTPUT_BAM="possorted_genome_bam_MD.bam"
    REFERENCE_FASTA="refdata-cellranger-hg19-3.0.0/fasta/genome.fa"
    
    # Add MD tags to alignments
    samtools calmd --threads 15 -rb "${INPUT_BAM}" "${REFERENCE_FASTA}" > "${OUTPUT_BAM}"
  3. 3

    Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.

    samtools (Inferred with models/gemini-2.5-flash) v1.19.1 GitHub
    $ Bash example
    # Install samtools if not already installed
    # conda install -c bioconda samtools
    
    # Define input BAM file and output directory
    INPUT_BAM="input.bam"
    OUTPUT_DIR="split_bams"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Extract unique CB:Z tags from the input BAM file
    # This command pipes the SAM output, extracts the CB:Z tag using grep,
    # cuts the barcode value, sorts it, and gets unique entries.
    samtools view "${INPUT_BAM}" | \
        grep -o 'CB:Z:[^[:space:]]*' | \
        cut -d':' -f3 | \
        sort -u > unique_barcodes.list
    
    # Loop through each unique barcode and split the BAM file
    while IFS= read -r BARCODE; do
        if [ -n "$BARCODE" ]; then # Ensure the barcode is not empty
            echo "Splitting reads for barcode: ${BARCODE}"
            # Filter the BAM file for reads containing the specific CB:Z tag
            # and write them to a new BAM file.
            # samtools view -h: output header and SAM format
            # grep -P "\tCB:Z:${BARCODE}\t": filter for lines containing the exact tag
            # samtools view -bS -: convert filtered SAM stream back to BAM
            samtools view -h "${INPUT_BAM}" | \
                grep -P "\tCB:Z:${BARCODE}\t" | \
                samtools view -bS - > "${OUTPUT_DIR}/${BARCODE}.bam"
            
            # Index the newly created BAM file (optional, but good practice for downstream tools)
            samtools index "${OUTPUT_DIR}/${BARCODE}.bam"
        fi
    done < unique_barcodes.list
    
    # Clean up the temporary file containing unique barcodes
    rm unique_barcodes.list
Raw Source Text
Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)
MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam
Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.
Assembly: hg19
Supplementary files format and content: Tab-separated values files and matrix files
← Back to Analysis