GSE125970 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Stratification of enterochromaffin cells by single-cell expression analysis.

eLife (2025) — PMID 40184163

Dataset

Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1

SOAPnuke vNot specified GitHub

$ Bash example

# Install SOAPnuke (if not already installed)
# conda install -c bioconda soapnuke

# Create output directory
mkdir -p filtered_data

# Define input and output paths
RAW_READS="raw_reads.fastq"
OUTPUT_DIR="filtered_data"
OUTPUT_PREFIX="sample_name" # SOAPnuke will append .clean.fastq to this

# Run SOAPnuke for quality control and preprocessing
SOAPnuke filter \
    -f "${RAW_READS}" \
    -o "${OUTPUT_DIR}" \
    -D "${OUTPUT_PREFIX}" \
    -n 0.1 \
    -l 10 \
    -A 0.25 \
    -Q 2 \
    -G \
    --seqType 1

View on GitHub

16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1).

umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub

$ Bash example

# Install umi_tools if not already installed
# conda install -c bioconda umi_tools

# Placeholder for input and output files
# Replace 'input_R1.fastq.gz' with your actual Read 1 file
# Replace 'output_R1_umi_extracted.fastq.gz' with your desired output file name

umi_tools extract \
    --bc-pattern=C{16} \
    --umi-pattern=N{10} \
    -I input_R1.fastq.gz \
    -S output_R1_umi_extracted.fastq.gz

View on GitHub

Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters

Cell Ranger v2.1

$ Bash example

# Install Cell Ranger (example, adjust path as needed)
# wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-2.1.0.tar.gz
# tar -xzf cellranger-2.1.0.tar.gz
# export PATH=/path/to/cellranger-2.1.0:$PATH

# Create output directory
mkdir -p my_cellranger_output

# Run cellranger count for reads alignment, filtering, barcode counting, and UMI counting
# Replace 'path/to/fastqs' with the actual directory containing FASTQ files (e.g., /data/fastqs)
# Replace 'path/to/transcriptome_reference' with the actual path to the Cell Ranger-compatible transcriptome reference (e.g., /ref/cellranger/refdata-cellranger-GRCh38-1.2.0)
# Replace 'my_sample_name' with the actual sample name if your FASTQ files are not named according to 10x conventions
cellranger count \
    --id=my_cellranger_output \
    --transcriptome=path/to/transcriptome_reference \
    --fastqs=path/to/fastqs \
    --sample=my_sample_name

Sample normalization and scaled gene expression data were calculated by Seurat2.3.2

Seurat v2.3.2 GitHub

$ Bash example

# Install Seurat (version 2.3.2 might require specific R/package versions)
# It's recommended to use a specific Bioconda environment for older versions.
# For example, to install R 3.4.x and Seurat 2.3.2:
# conda create -n seurat232 r=3.4.4 r-seurat=2.3.2 -c conda-forge -c bioconda
# conda activate seurat232

# Create an R script for normalization and scaling
cat << 'EOF' > run_seurat_normalization.R
library(Seurat)

# Load your raw count matrix (replace 'counts.tsv' with your actual file path)
# The count matrix should have genes as rows and cells as columns.
# Example: counts <- read.table("counts.tsv", sep="\t", header=TRUE, row.names=1)
# For demonstration, let's create a dummy matrix
set.seed(123)
counts <- matrix(sample(0:100, 1000, replace = TRUE), ncol = 10)
rownames(counts) <- paste0("gene", 1:100)
colnames(counts) <- paste0("cell", 1:10)

# Create a Seurat object
# For Seurat v2, the raw.data slot was used for the initial count matrix
seurat_obj <- CreateSeuratObject(raw.data = counts)

# Normalize the data
# LogNormalize is a common method for scRNA-seq data
seurat_obj <- NormalizeData(object = seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)

# Scale the data
# This scales and centers the data across cells for each gene
seurat_obj <- ScaleData(object = seurat_obj)

# Save the processed Seurat object
saveRDS(seurat_obj, "normalized_scaled_seurat_object.rds")

# Optionally, print a summary or head of the scaled data
# print(head(seurat_obj@scale.data[1:5, 1:5]))
EOF

# Execute the R script
Rscript run_seurat_normalization.R

View on GitHub

Raw Source Text

Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1
16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1).
Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters
Sample normalization and scaled gene expression data were calculated by Seurat2.3.2
Genome_build: GRCh38_Human
Supplementary_files_format_and_content: tab-delimited text files include raw UMIcounts and scaled data for each cell

← Back to Analysis