GSE244832 scRNA-seq Data Processing

scRNA-seq geo_data_processing 9 steps

Publication

RNA-binding protein LARP6 coordinates hepatic stellate cell activation and liver fibrosis.

The Journal of clinical investigation (2026) — PMID 41746718

Dataset

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

For each sequenced snATAC-Seq libraries, we obtained four FASTQ files paired-end DNA reads as well as the combinatorial indexes for i5 (768 different PCR indices) and T7 (96 different tagmentation indices; Supplementary Table 8).
2

We selected all reads with <= 2 mistakes per individual index (Hamming distance between each pair of indices is 4) and subsequently integrated the full barcode at the beginning of the read name in the FASTQ files (https://gitlab.com/Grouumf/ATACdemultiplex/).
3

Next, we used trim galore (v.0.4.4) to remove adapter sequences from reads prior to read alignment.

Trim Galore v0.4.4
4

We aligned reads to the reference genome using bwa mem (v.0.7.17) 10 and subsequently used samtools to remove unmapped, low map quality (MAPQ<30), secondary, and mitochondrial reads.

BWA v0.7.17
5

Downstream analysis of snATACseq peaks were conducted with the Signac tool.

Signac
6

Sequencing reads were demultiplexed (cellranger mkfastq) and processed (cellranger count) using the Cell Ranger software package v3.0.2 (10x Genomics).

Cell Ranger
7

Reads were aligned to the human reference hg38 (Cell Ranger software package v3.0.2).

Cell Ranger
8

Reads mapping to intronic and exon sequences were retained.
9

Resulting UMI feature-barcode count matrices were loaded into Seurat for downstream processing

Seurat

Tools Used

Trim Galore

Raw Source Text

For each sequenced snATAC-Seq libraries, we obtained four FASTQ files paired-end DNA reads as well as the combinatorial indexes for i5 (768 different PCR indices) and T7 (96 different tagmentation indices; Supplementary Table 8). We selected all reads with <= 2 mistakes per individual index (Hamming distance between each pair of indices is 4) and subsequently integrated the full barcode at the beginning of the read name in the FASTQ files (https://gitlab.com/Grouumf/ATACdemultiplex/).
Next, we used trim galore (v.0.4.4) to remove adapter sequences from reads prior to read alignment.
We aligned reads to the reference genome using bwa mem (v.0.7.17) 10 and subsequently used samtools to remove unmapped, low map quality (MAPQ<30), secondary, and mitochondrial reads.
Downstream analysis of snATACseq peaks were conducted with the Signac tool.
Sequencing reads were demultiplexed (cellranger mkfastq) and processed (cellranger count) using the Cell Ranger software package v3.0.2 (10x Genomics). Reads were aligned to the human reference hg38 (Cell Ranger software package v3.0.2). Reads mapping to intronic and exon sequences were retained. Resulting UMI feature-barcode count matrices were loaded into Seurat for downstream processing
Assembly: hg38
Supplementary files format and content: snATAC-seq: bed files containing regions of open chromatin (peaks), per sample
Supplementary files format and content: snRNA-seq: hLIVER_processed_files.tar.gz. Contains sparse matrix file containing raw counts (one file for all samples), csv files containing gene names and cell ids, and csv file containing cell metadata (including cluster ID, sample ID, condition, QC metrics).

← Back to Analysis