GSE201891 Processing Pipeline
RNA-Seq
code_examples
3 steps
Publication
MECP2-related pathways are dysregulated in a cortical organoid model of myotonic dystrophy.Science translational medicine (2022) — PMID 35767654
Dataset
GSE201891MECP2-related pathways are dysregulated in a cortical organoid model of Myotonic dystrophy [scRNA-Seq]
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2)
Cell Ranger v3.0.2$ Bash example
# Cell Ranger 3.0.2 is typically installed by downloading the tarball and extracting it. # Example: # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-3.0.2.tar.gz # tar -xzf cellranger-3.0.2.tar.gz # export PATH=/path/to/cellranger-3.0.2:$PATH # Download a reference transcriptome (e.g., GRCh38) if not already available. # Example: # wget https://cf.10xgenomics.com/releases/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz # tar -xzf refdata-gex-GRCh38-2020-A.tar.gz # mv refdata-gex-GRCh38-2020-A /path/to/your/references/ # Define variables (replace with actual paths and sample ID) SAMPLE_ID="my_sample_id" FASTQ_DIR="/path/to/your/fastqs" # Directory containing FASTQ files for the sample TRANSCRIPTOME_REF="/path/to/your/references/refdata-gex-GRCh38-2020-A" # Example GRCh38 reference # Execute cellranger count for alignment, cell barcode processing, UMI processing, and abundance measurements cellranger count \ --id="${SAMPLE_ID}" \ --transcriptome="${TRANSCRIPTOME_REF}" \ --fastqs="${FASTQ_DIR}" \ --sample="${SAMPLE_ID}" # Use --sample if multiple samples are in FASTQ_DIR and you want to process a specific one -
2
MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam
$ Bash example
# Install samtools (e.g., using conda) # conda install -c bioconda samtools=1.9 # Define input and output files INPUT_BAM="possorted_genome_bam.bam" OUTPUT_BAM="possorted_genome_bam_MD.bam" REFERENCE_FASTA="refdata-cellranger-hg19-3.0.0/fasta/genome.fa" # Add MD tags to alignments samtools calmd --threads 15 -rb "${INPUT_BAM}" "${REFERENCE_FASTA}" > "${OUTPUT_BAM}" -
3
Reads were split based on the CB:Z tag, resulting in one BAM file per barcode.
$ Bash example
# Install samtools if not already installed # conda install -c bioconda samtools # Define input BAM file and output directory INPUT_BAM="input.bam" OUTPUT_DIR="split_bams" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Extract unique CB:Z tags from the input BAM file # This command pipes the SAM output, extracts the CB:Z tag using grep, # cuts the barcode value, sorts it, and gets unique entries. samtools view "${INPUT_BAM}" | \ grep -o 'CB:Z:[^[:space:]]*' | \ cut -d':' -f3 | \ sort -u > unique_barcodes.list # Loop through each unique barcode and split the BAM file while IFS= read -r BARCODE; do if [ -n "$BARCODE" ]; then # Ensure the barcode is not empty echo "Splitting reads for barcode: ${BARCODE}" # Filter the BAM file for reads containing the specific CB:Z tag # and write them to a new BAM file. # samtools view -h: output header and SAM format # grep -P "\tCB:Z:${BARCODE}\t": filter for lines containing the exact tag # samtools view -bS -: convert filtered SAM stream back to BAM samtools view -h "${INPUT_BAM}" | \ grep -P "\tCB:Z:${BARCODE}\t" | \ samtools view -bS - > "${OUTPUT_DIR}/${BARCODE}.bam" # Index the newly created BAM file (optional, but good practice for downstream tools) samtools index "${OUTPUT_DIR}/${BARCODE}.bam" fi done < unique_barcodes.list # Clean up the temporary file containing unique barcodes rm unique_barcodes.list
Raw Source Text
Alignment, cell barcode processing, umis processing, abundance measurements: cellranger count (version 3.0.2) MD tags were added to alignments with samtools calmd --threads 15 -rb possorted_genome_bam.bam refdata-cellranger-hg19-3.0.0/fasta/genome.fa > possorted_genome_bam_MD.bam Reads were split based on the CB:Z tag, resulting in one BAM file per barcode. Assembly: hg19 Supplementary files format and content: Tab-separated values files and matrix files