GSE147005 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Splicing factor SRSF1 deficiency in the liver triggers NASH-like pathology and cell death.

Nature communications (2023) — PMID 36759613

Dataset

GSE147005

Loss of canonical splicing factor SRSF1 in hepatocytes results in acute liver injury and regeneration

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina)

    bcl2fastq v2.17.1.14 GitHub
    $ Bash example
    # Install bcl2fastq (e.g., via Illumina installer or specific environment setup)
    # For example, on some systems, it might be available as a module:
    # module load bcl2fastq/2.17.1.14
    
    # Example command for bcl2fastq
    # Replace /path/to/run_directory and /path/to/output_directory with actual paths
    bcl2fastq --runfolder-dir /path/to/run_directory --output-dir /path/to/output_directory --no-lane-splitting --minimum-trimmed-read-length 0 --mask-short-adapter-reads 0 --barcode-mismatches 1 --ignore-missing-bcl --ignore-missing-stats --ignore-missing-positions --create-fastq-for-index-reads --loading-threads 4 --processing-threads 4 --writing-threads 4
  2. 2

    Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0

    kallisto v0.46.0 GitHub
    $ Bash example
    # Install kallisto (if not already installed)
    # conda install -c bioconda kallisto=0.46.0
    
    # Placeholder for kallisto index (e.g., built from human GRCh38/hg38 transcriptome)
    # To build the index, you would typically use a command like:
    # kallisto index -i human_transcriptome.idx gencode.vXX.transcripts.fa.gz
    
    # Placeholder for input sequenced reads (replace with actual file paths)
    READ1="sample_R1.fastq.gz"
    READ2="sample_R2.fastq.gz"
    
    # Placeholder for kallisto index file
    KALLISTO_INDEX="human_transcriptome.idx" # e.g., built from GENCODE human transcriptome (GRCh38)
    
    # Output directory for kallisto results
    OUTPUT_DIR="kallisto_quant_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Compute transcript abundances using kallisto
    kallisto quant \
      -i "${KALLISTO_INDEX}" \
      -o "${OUTPUT_DIR}" \
      "${READ1}" "${READ2}"
  3. 3

    kallisto index was generated using Gencode annotation, GRCm38_vM19

    kallisto v0.48.0 (Inferred with models/gemini-2.5-flash) GitHub
    $ Bash example
    # Install kallisto
    # conda install -c bioconda kallisto
    
    # Download Gencode vM19 (GRCm38) transcriptome FASTA
    # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M19/gencode.vM19.transcripts.fa.gz
    # gunzip gencode.vM19.transcripts.fa.gz
    
    # Generate kallisto index
    kallisto index -i gencode.vM19.grcm38.idx gencode.vM19.transcripts.fa
  4. 4

    Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2

    kallisto vNot specified GitHub
    $ Bash example
    # Install kallisto (if not already installed)
    # conda install -c bioconda kallisto
    
    # Reference data (placeholder - replace with actual paths and versions)
    # For kallisto, a transcriptome FASTA file is required to build an index.
    # Example: Download human Gencode v44 transcriptome FASTA
    # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz
    # gunzip gencode.v44.transcripts.fa.gz
    
    # Build kallisto index
    # kallisto index -i human_gencode_v44_transcriptome.idx gencode.v44.transcripts.fa
    
    # Perform kallisto quantification for each sample
    # Replace with actual sample names, read files, and index path
    SAMPLES=("sample_1" "sample_2" "sample_3" "sample_4") # Example sample names
    INDEX="human_gencode_v44_transcriptome.idx" # Path to kallisto index
    OUTPUT_DIR="kallisto_quant_results"
    mkdir -p "${OUTPUT_DIR}"
    
    for SAMPLE in "${SAMPLES[@]}"; do
        READ1="${SAMPLE}_R1.fastq.gz" # Adjust if your files are named differently
        READ2="${SAMPLE}_R2.fastq.gz" # Adjust if your files are named differently
        kallisto quant -i "${INDEX}" -o "${OUTPUT_DIR}/${SAMPLE}" "${READ1}" "${READ2}"
    done
    
    # Create the R script file for tximport and DESeq2 analysis
    cat << 'EOF' > run_deseq2.R
    # Install R packages (if not already installed)
    # if (!requireNamespace("BiocManager", quietly = TRUE))
    #     install.packages("BiocManager")
    # BiocManager::install("tximport")
    # BiocManager::install("DESeq2")
    # BiocManager::install("AnnotationDbi") # For gene mapping
    # BiocManager::install("org.Hs.eg.db") # For human gene mapping
    
    library(tximport)
    library(DESeq2)
    # library(AnnotationDbi)
    # library(org.Hs.eg.db) # Example for human
    
    # Define paths to kallisto output directories
    # These should match the output from the kallisto quant step
    kallisto_output_dir <- "kallisto_quant_results"
    sample_names <- c("sample_1", "sample_2", "sample_3", "sample_4") # Must match samples used in kallisto quant
    files <- file.path(kallisto_output_dir, sample_names, "abundance.h5")
    names(files) <- sample_names
    
    # Create a sample information table (design matrix)
    # Replace with your actual experimental design (e.g., conditions, batches)
    # Example: 2 conditions, 2 replicates each
    sample_info <- data.frame(
      sample = sample_names,
      condition = factor(c("control", "control", "treated", "treated")),
      replicate = factor(c("rep1", "rep2", "rep1", "rep2"))
    )
    rownames(sample_info) <- sample_names
    
    # Optional: Create a transcript-to-gene mapping table if performing gene-level DGE
    # This step requires a GTF/GFF file used for the transcriptome or an annotation package.
    # Example for human using Gencode v44 and org.Hs.eg.db:
    # txdb <- makeTxDbFromGFF("gencode.v44.annotation.gtf.gz") # Requires GenomicFeatures package
    # k <- keys(txdb, keytype = "TXNAME")
    # tx2gene <- select(txdb, k, "GENEID", "TXNAME")
    # txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE)
    
    # Import kallisto abundance data using tximport
    # Use txOut=TRUE for transcript-level analysis, or tx2gene for gene-level analysis
    txi <- tximport(files, type = "kallisto", txOut = TRUE) # Example: transcript-level analysis
    
    # Create DESeq2 object
    dds <- DESeqDataSetFromTximport(txi, colData = sample_info, design = ~ condition)
    
    # Pre-filtering (optional, but recommended for DESeq2)
    # Remove genes/transcripts with very low counts across all samples
    keep <- rowSums(counts(dds)) >= 10
    dds <- dds[keep,]
    
    # Run DESeq2 analysis
    dds <- DESeq(dds)
    
    # Get results
    res <- results(dds)
    summary(res)
    
    # Order results by adjusted p-value
    res_ordered <- res[order(res$padj),]
    
    # Save results
    write.csv(as.data.frame(res_ordered), file = "deseq2_results.csv")
    
    # Optional: Generate an MA plot
    # png("deseq2_MA_plot.png")
    # plotMA(res, main="DESeq2 MA-plot")
    # dev.off()
    EOF
    
    # Execute the R script for differential gene expression analysis
    Rscript run_deseq2.R
Raw Source Text
Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina)
Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0
kallisto index was generated using Gencode annotation, GRCm38_vM19
Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2
Genome_build: mm10
Supplementary_files_format_and_content: Table of TPM values of annotated transcripts and DESeq2 output of differential gene expression
← Back to Analysis