GSE125970 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Stratification of enterochromaffin cells by single-cell expression analysis.eLife (2025) — PMID 40184163
Dataset
GSE125970Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1
$ Bash example
# Install SOAPnuke (if not already installed) # conda install -c bioconda soapnuke # Create output directory mkdir -p filtered_data # Define input and output paths RAW_READS="raw_reads.fastq" OUTPUT_DIR="filtered_data" OUTPUT_PREFIX="sample_name" # SOAPnuke will append .clean.fastq to this # Run SOAPnuke for quality control and preprocessing SOAPnuke filter \ -f "${RAW_READS}" \ -o "${OUTPUT_DIR}" \ -D "${OUTPUT_PREFIX}" \ -n 0.1 \ -l 10 \ -A 0.25 \ -Q 2 \ -G \ --seqType 1 -
2
16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1).
$ Bash example
# Install umi_tools if not already installed # conda install -c bioconda umi_tools # Placeholder for input and output files # Replace 'input_R1.fastq.gz' with your actual Read 1 file # Replace 'output_R1_umi_extracted.fastq.gz' with your desired output file name umi_tools extract \ --bc-pattern=C{16} \ --umi-pattern=N{10} \ -I input_R1.fastq.gz \ -S output_R1_umi_extracted.fastq.gz -
3
Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters
Cell Ranger v2.1$ Bash example
# Install Cell Ranger (example, adjust path as needed) # wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-2.1.0.tar.gz # tar -xzf cellranger-2.1.0.tar.gz # export PATH=/path/to/cellranger-2.1.0:$PATH # Create output directory mkdir -p my_cellranger_output # Run cellranger count for reads alignment, filtering, barcode counting, and UMI counting # Replace 'path/to/fastqs' with the actual directory containing FASTQ files (e.g., /data/fastqs) # Replace 'path/to/transcriptome_reference' with the actual path to the Cell Ranger-compatible transcriptome reference (e.g., /ref/cellranger/refdata-cellranger-GRCh38-1.2.0) # Replace 'my_sample_name' with the actual sample name if your FASTQ files are not named according to 10x conventions cellranger count \ --id=my_cellranger_output \ --transcriptome=path/to/transcriptome_reference \ --fastqs=path/to/fastqs \ --sample=my_sample_name -
4
Sample normalization and scaled gene expression data were calculated by Seurat2.3.2
$ Bash example
# Install Seurat (version 2.3.2 might require specific R/package versions) # It's recommended to use a specific Bioconda environment for older versions. # For example, to install R 3.4.x and Seurat 2.3.2: # conda create -n seurat232 r=3.4.4 r-seurat=2.3.2 -c conda-forge -c bioconda # conda activate seurat232 # Create an R script for normalization and scaling cat << 'EOF' > run_seurat_normalization.R library(Seurat) # Load your raw count matrix (replace 'counts.tsv' with your actual file path) # The count matrix should have genes as rows and cells as columns. # Example: counts <- read.table("counts.tsv", sep="\t", header=TRUE, row.names=1) # For demonstration, let's create a dummy matrix set.seed(123) counts <- matrix(sample(0:100, 1000, replace = TRUE), ncol = 10) rownames(counts) <- paste0("gene", 1:100) colnames(counts) <- paste0("cell", 1:10) # Create a Seurat object # For Seurat v2, the raw.data slot was used for the initial count matrix seurat_obj <- CreateSeuratObject(raw.data = counts) # Normalize the data # LogNormalize is a common method for scRNA-seq data seurat_obj <- NormalizeData(object = seurat_obj, normalization.method = "LogNormalize", scale.factor = 10000) # Scale the data # This scales and centers the data across cells for each gene seurat_obj <- ScaleData(object = seurat_obj) # Save the processed Seurat object saveRDS(seurat_obj, "normalized_scaled_seurat_object.rds") # Optionally, print a summary or head of the scaled data # print(head(seurat_obj@scale.data[1:5, 1:5])) EOF # Execute the R script Rscript run_seurat_normalization.R
Raw Source Text
Quality control and preprocessing of high throughput sequencing data were using SOAPnuke with parameters -n 0.1 -l 10 -A 0.25 -Q 2 -G --seqType 1 16 bp 10xTM Barcodes and 10bp UMIs were encoded at the start of Read 1 (R1). Reads alignment, filtering, barcode counting, and UMI counting were performed using cellranger 2.1 with default parameters Sample normalization and scaled gene expression data were calculated by Seurat2.3.2 Genome_build: GRCh38_Human Supplementary_files_format_and_content: tab-delimited text files include raw UMIcounts and scaled data for each cell