GSE125970 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Stratification of enterochromaffin cells by single-cell expression analysis.

eLife (2025) — PMID 40184163

Dataset

GSE125970

Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1

    SOAPnuke vNot specified GitHub
    $ Bash example
    # Install SOAPnuke (if not already installed)
    # conda install -c bioconda soapnuke
    
    # Create output directory
    mkdir -p filtered_data
    
    # Define input and output paths
    RAW_READS="raw_reads.fastq"
    OUTPUT_DIR="filtered_data"
    OUTPUT_PREFIX="sample_name" # SOAPnuke will append .clean.fastq to this
    
    # Run SOAPnuke for quality control and preprocessing
    SOAPnuke filter \
        -f "${RAW_READS}" \
        -o "${OUTPUT_DIR}" \
        -D "${OUTPUT_PREFIX}" \
        -n 0.1 \
        -l 10 \
        -A 0.25 \
        -Q 2 \
        -G \
        --seqType 1
  2. 2

    16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1).

    umi_tools (Inferred with models/gemini-2.5-flash) v1.1.2 GitHub
    $ Bash example
    # Install umi_tools if not already installed
    # conda install -c bioconda umi_tools
    
    # Placeholder for input and output files
    # Replace 'input_R1.fastq.gz' with your actual Read 1 file
    # Replace 'output_R1_umi_extracted.fastq.gz' with your desired output file name
    
    umi_tools extract \
        --bc-pattern=C{16} \
        --umi-pattern=N{10} \
        -I input_R1.fastq.gz \
        -S output_R1_umi_extracted.fastq.gz
  3. 3

    Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters

    Cell Ranger v2.1
    $ Bash example
    # Install Cell Ranger (example, adjust path as needed)
    # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-2.1.0.tar.gz
    # tar -xzf cellranger-2.1.0.tar.gz
    # export PATH=/path/to/cellranger-2.1.0:$PATH
    
    # Create output directory
    mkdir -p my_cellranger_output
    
    # Run cellranger count for reads alignment, filtering, barcode counting, and UMI counting
    # Replace 'path/to/fastqs' with the actual directory containing FASTQ files (e.g., /data/fastqs)
    # Replace 'path/to/transcriptome_reference' with the actual path to the Cell Ranger-compatible transcriptome reference (e.g., /ref/cellranger/refdata-cellranger-GRCh38-1.2.0)
    # Replace 'my_sample_name' with the actual sample name if your FASTQ files are not named according to 10x conventions
    cellranger count \
        --id=my_cellranger_output \
        --transcriptome=path/to/transcriptome_reference \
        --fastqs=path/to/fastqs \
        --sample=my_sample_name
  4. 4

    Sample normalization and scaled gene expression data were calculated by Seurat2.3.2

    Seurat v2.3.2 GitHub
    $ Bash example
    # Install Seurat (version 2.3.2 might require specific R/package versions)
    # It's recommended to use a specific Bioconda environment for older versions.
    # For example, to install R 3.4.x and Seurat 2.3.2:
    # conda create -n seurat232 r=3.4.4 r-seurat=2.3.2 -c conda-forge -c bioconda
    # conda activate seurat232
    
    # Create an R script for normalization and scaling
    cat << 'EOF' > run_seurat_normalization.R
    library(Seurat)
    
    # Load your raw count matrix (replace 'counts.tsv' with your actual file path)
    # The count matrix should have genes as rows and cells as columns.
    # Example: counts <- read.table("counts.tsv", sep="\t", header=TRUE, row.names=1)
    # For demonstration, let's create a dummy matrix
    set.seed(123)
    counts <- matrix(sample(0:100, 1000, replace = TRUE), ncol = 10)
    rownames(counts) <- paste0("gene", 1:100)
    colnames(counts) <- paste0("cell", 1:10)
    
    # Create a Seurat object
    # For Seurat v2, the raw.data slot was used for the initial count matrix
    seurat_obj <- CreateSeuratObject(raw.data = counts)
    
    # Normalize the data
    # LogNormalize is a common method for scRNA-seq data
    seurat_obj <- NormalizeData(object = seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000)
    
    # Scale the data
    # This scales and centers the data across cells for each gene
    seurat_obj <- ScaleData(object = seurat_obj)
    
    # Save the processed Seurat object
    saveRDS(seurat_obj, "normalized_scaled_seurat_object.rds")
    
    # Optionally, print a summary or head of the scaled data
    # print(head(seurat_obj@scale.data[1:5, 1:5]))
    EOF
    
    # Execute the R script
    Rscript run_seurat_normalization.R
Raw Source Text
Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1
16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1).
Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters
Sample normalization and scaled gene expression data were calculated by Seurat2.3.2
Genome_build: GRCh38_Human
Supplementary_files_format_and_content: tab-delimited text files include raw UMIcounts and scaled data for each cell
← Back to Analysis