GSE244832 scRNA-seq Data Processing
Publication
RNA-binding protein LARP6 coordinates hepatic stellate cell activation and liver fibrosis.The Journal of clinical investigation (2026) — PMID 41746718
Dataset
GSE244832Processing Steps
Generate Jupyter Notebook-
1
For each sequenced snATAC-Seq libraries, we obtained four FASTQ files paired-end DNA reads as well as the combinatorial indexes for i5 (768 different PCR indices) and T7 (96 different tagmentation indices; Supplementary Table 8).
-
2
We selected all reads with <= 2 mistakes per individual index (Hamming distance between each pair of indices is 4) and subsequently integrated the full barcode at the beginning of the read name in the FASTQ files (https://gitlab.com/Grouumf/ATACdemultiplex/).
-
3
Next, we used trim galore (v.0.4.4) to remove adapter sequences from reads prior to read alignment.
Trim Galore v0.4.4 -
4
We aligned reads to the reference genome using bwa mem (v.0.7.17) 10 and subsequently used samtools to remove unmapped, low map quality (MAPQ<30), secondary, and mitochondrial reads.
BWA v0.7.17 -
5
Downstream analysis of snATACseq peaks were conducted with the Signac tool.
Signac -
6
Sequencing reads were demultiplexed (cellranger mkfastq) and processed (cellranger count) using the Cell Ranger software package v3.0.2 (10x Genomics).
Cell Ranger -
7
Reads were aligned to the human reference hg38 (Cell Ranger software package v3.0.2).
Cell Ranger -
8
Reads mapping to intronic and exon sequences were retained.
-
9
Resulting UMI feature-barcode count matrices were loaded into Seurat for downstream processing
Seurat
Tools Used
Raw Source Text
For each sequenced snATAC-Seq libraries, we obtained four FASTQ files paired-end DNA reads as well as the combinatorial indexes for i5 (768 different PCR indices) and T7 (96 different tagmentation indices; Supplementary Table 8). We selected all reads with <= 2 mistakes per individual index (Hamming distance between each pair of indices is 4) and subsequently integrated the full barcode at the beginning of the read name in the FASTQ files (https://gitlab.com/Grouumf/ATACdemultiplex/). Next, we used trim galore (v.0.4.4) to remove adapter sequences from reads prior to read alignment. We aligned reads to the reference genome using bwa mem (v.0.7.17) 10 and subsequently used samtools to remove unmapped, low map quality (MAPQ<30), secondary, and mitochondrial reads. Downstream analysis of snATACseq peaks were conducted with the Signac tool. Sequencing reads were demultiplexed (cellranger mkfastq) and processed (cellranger count) using the Cell Ranger software package v3.0.2 (10x Genomics). Reads were aligned to the human reference hg38 (Cell Ranger software package v3.0.2). Reads mapping to intronic and exon sequences were retained. Resulting UMI feature-barcode count matrices were loaded into Seurat for downstream processing Assembly: hg38 Supplementary files format and content: snATAC-seq: bed files containing regions of open chromatin (peaks), per sample Supplementary files format and content: snRNA-seq: hLIVER_processed_files.tar.gz. Contains sparse matrix file containing raw counts (one file for all samples), csv files containing gene names and cell ids, and csv file containing cell metadata (including cluster ID, sample ID, condition, QC metrics).