GSE147005 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Splicing factor SRSF1 deficiency in the liver triggers NASH-like pathology and cell death.

Nature communications (2023) — PMID 36759613

Dataset

Loss of canonical splicing factor SRSF1 in hepatocytes results in acute liver injury and regeneration

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina)

bcl2fastq v2.17.1.14 GitHub

$ Bash example

# Install bcl2fastq (e.g., via Illumina installer or specific environment setup)
# For example, on some systems, it might be available as a module:
# module load bcl2fastq/2.17.1.14

# Example command for bcl2fastq
# Replace /path/to/run_directory and /path/to/output_directory with actual paths
bcl2fastq --runfolder-dir /path/to/run_directory --output-dir /path/to/output_directory --no-lane-splitting --minimum-trimmed-read-length 0 --mask-short-adapter-reads 0 --barcode-mismatches 1 --ignore-missing-bcl --ignore-missing-stats --ignore-missing-positions --create-fastq-for-index-reads --loading-threads 4 --processing-threads 4 --writing-threads 4

View on GitHub

Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0

kallisto v0.46.0 GitHub

$ Bash example

# Install kallisto (if not already installed)
# conda install -c bioconda kallisto=0.46.0

# Placeholder for kallisto index (e.g., built from human GRCh38/hg38 transcriptome)
# To build the index, you would typically use a command like:
# kallisto index -i human_transcriptome.idx gencode.vXX.transcripts.fa.gz

# Placeholder for input sequenced reads (replace with actual file paths)
READ1="sample_R1.fastq.gz"
READ2="sample_R2.fastq.gz"

# Placeholder for kallisto index file
KALLISTO_INDEX="human_transcriptome.idx" # e.g., built from GENCODE human transcriptome (GRCh38)

# Output directory for kallisto results
OUTPUT_DIR="kallisto_quant_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Compute transcript abundances using kallisto
kallisto quant \
  -i "${KALLISTO_INDEX}" \
  -o "${OUTPUT_DIR}" \
  "${READ1}" "${READ2}"

View on GitHub

kallisto index was generated using Gencode annotation, GRCm38_vM19

kallisto v0.48.0 (Inferred with models/gemini-2.5-flash) GitHub

$ Bash example

# Install kallisto
# conda install -c bioconda kallisto

# Download Gencode vM19 (GRCm38) transcriptome FASTA
# wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M19/gencode.vM19.transcripts.fa.gz
# gunzip gencode.vM19.transcripts.fa.gz

# Generate kallisto index
kallisto index -i gencode.vM19.grcm38.idx gencode.vM19.transcripts.fa

View on GitHub

Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2

kallisto vNot specified GitHub

$ Bash example

# Install kallisto (if not already installed)
# conda install -c bioconda kallisto

# Reference data (placeholder - replace with actual paths and versions)
# For kallisto, a transcriptome FASTA file is required to build an index.
# Example: Download human Gencode v44 transcriptome FASTA
# wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz
# gunzip gencode.v44.transcripts.fa.gz

# Build kallisto index
# kallisto index -i human_gencode_v44_transcriptome.idx gencode.v44.transcripts.fa

# Perform kallisto quantification for each sample
# Replace with actual sample names, read files, and index path
SAMPLES=("sample_1" "sample_2" "sample_3" "sample_4") # Example sample names
INDEX="human_gencode_v44_transcriptome.idx" # Path to kallisto index
OUTPUT_DIR="kallisto_quant_results"
mkdir -p "${OUTPUT_DIR}"

for SAMPLE in "${SAMPLES[@]}"; do
    READ1="${SAMPLE}_R1.fastq.gz" # Adjust if your files are named differently
    READ2="${SAMPLE}_R2.fastq.gz" # Adjust if your files are named differently
    kallisto quant -i "${INDEX}" -o "${OUTPUT_DIR}/${SAMPLE}" "${READ1}" "${READ2}"
done

# Create the R script file for tximport and DESeq2 analysis
cat << 'EOF' > run_deseq2.R
# Install R packages (if not already installed)
# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("tximport")
# BiocManager::install("DESeq2")
# BiocManager::install("AnnotationDbi") # For gene mapping
# BiocManager::install("org.Hs.eg.db") # For human gene mapping

library(tximport)
library(DESeq2)
# library(AnnotationDbi)
# library(org.Hs.eg.db) # Example for human

# Define paths to kallisto output directories
# These should match the output from the kallisto quant step
kallisto_output_dir <- "kallisto_quant_results"
sample_names <- c("sample_1", "sample_2", "sample_3", "sample_4") # Must match samples used in kallisto quant
files <- file.path(kallisto_output_dir, sample_names, "abundance.h5")
names(files) <- sample_names

# Create a sample information table (design matrix)
# Replace with your actual experimental design (e.g., conditions, batches)
# Example: 2 conditions, 2 replicates each
sample_info <- data.frame(
  sample = sample_names,
  condition = factor(c("control", "control", "treated", "treated")),
  replicate = factor(c("rep1", "rep2", "rep1", "rep2"))
)
rownames(sample_info) <- sample_names

# Optional: Create a transcript-to-gene mapping table if performing gene-level DGE
# This step requires a GTF/GFF file used for the transcriptome or an annotation package.
# Example for human using Gencode v44 and org.Hs.eg.db:
# txdb <- makeTxDbFromGFF("gencode.v44.annotation.gtf.gz") # Requires GenomicFeatures package
# k <- keys(txdb, keytype = "TXNAME")
# tx2gene <- select(txdb, k, "GENEID", "TXNAME")
# txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE)

# Import kallisto abundance data using tximport
# Use txOut=TRUE for transcript-level analysis, or tx2gene for gene-level analysis
txi <- tximport(files, type = "kallisto", txOut = TRUE) # Example: transcript-level analysis

# Create DESeq2 object
dds <- DESeqDataSetFromTximport(txi, colData = sample_info, design = ~ condition)

# Pre-filtering (optional, but recommended for DESeq2)
# Remove genes/transcripts with very low counts across all samples
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

# Run DESeq2 analysis
dds <- DESeq(dds)

# Get results
res <- results(dds)
summary(res)

# Order results by adjusted p-value
res_ordered <- res[order(res$padj),]

# Save results
write.csv(as.data.frame(res_ordered), file = "deseq2_results.csv")

# Optional: Generate an MA plot
# png("deseq2_MA_plot.png")
# plotMA(res, main="DESeq2 MA-plot")
# dev.off()
EOF

# Execute the R script for differential gene expression analysis
Rscript run_deseq2.R

View on GitHub

Raw Source Text

Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina)
Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0
kallisto index was generated using Gencode annotation, GRCm38_vM19
Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2
Genome_build: mm10
Supplementary_files_format_and_content: Table of TPM values of annotated transcripts and DESeq2 output of differential gene expression

← Back to Analysis