GSE147005 Processing Pipeline
RNA-Seq
code_examples
4 steps
Publication
Splicing factor SRSF1 deficiency in the liver triggers NASH-like pathology and cell death.Nature communications (2023) — PMID 36759613
Dataset
GSE147005Loss of canonical splicing factor SRSF1 in hepatocytes results in acute liver injury and regeneration
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina)
$ Bash example
# Install bcl2fastq (e.g., via Illumina installer or specific environment setup) # For example, on some systems, it might be available as a module: # module load bcl2fastq/2.17.1.14 # Example command for bcl2fastq # Replace /path/to/run_directory and /path/to/output_directory with actual paths bcl2fastq --runfolder-dir /path/to/run_directory --output-dir /path/to/output_directory --no-lane-splitting --minimum-trimmed-read-length 0 --mask-short-adapter-reads 0 --barcode-mismatches 1 --ignore-missing-bcl --ignore-missing-stats --ignore-missing-positions --create-fastq-for-index-reads --loading-threads 4 --processing-threads 4 --writing-threads 4
-
2
Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0
$ Bash example
# Install kallisto (if not already installed) # conda install -c bioconda kallisto=0.46.0 # Placeholder for kallisto index (e.g., built from human GRCh38/hg38 transcriptome) # To build the index, you would typically use a command like: # kallisto index -i human_transcriptome.idx gencode.vXX.transcripts.fa.gz # Placeholder for input sequenced reads (replace with actual file paths) READ1="sample_R1.fastq.gz" READ2="sample_R2.fastq.gz" # Placeholder for kallisto index file KALLISTO_INDEX="human_transcriptome.idx" # e.g., built from GENCODE human transcriptome (GRCh38) # Output directory for kallisto results OUTPUT_DIR="kallisto_quant_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Compute transcript abundances using kallisto kallisto quant \ -i "${KALLISTO_INDEX}" \ -o "${OUTPUT_DIR}" \ "${READ1}" "${READ2}" -
3
kallisto index was generated using Gencode annotation, GRCm38_vM19
$ Bash example
# Install kallisto # conda install -c bioconda kallisto # Download Gencode vM19 (GRCm38) transcriptome FASTA # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M19/gencode.vM19.transcripts.fa.gz # gunzip gencode.vM19.transcripts.fa.gz # Generate kallisto index kallisto index -i gencode.vM19.grcm38.idx gencode.vM19.transcripts.fa
-
4
Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2
$ Bash example
# Install kallisto (if not already installed) # conda install -c bioconda kallisto # Reference data (placeholder - replace with actual paths and versions) # For kallisto, a transcriptome FASTA file is required to build an index. # Example: Download human Gencode v44 transcriptome FASTA # wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz # gunzip gencode.v44.transcripts.fa.gz # Build kallisto index # kallisto index -i human_gencode_v44_transcriptome.idx gencode.v44.transcripts.fa # Perform kallisto quantification for each sample # Replace with actual sample names, read files, and index path SAMPLES=("sample_1" "sample_2" "sample_3" "sample_4") # Example sample names INDEX="human_gencode_v44_transcriptome.idx" # Path to kallisto index OUTPUT_DIR="kallisto_quant_results" mkdir -p "${OUTPUT_DIR}" for SAMPLE in "${SAMPLES[@]}"; do READ1="${SAMPLE}_R1.fastq.gz" # Adjust if your files are named differently READ2="${SAMPLE}_R2.fastq.gz" # Adjust if your files are named differently kallisto quant -i "${INDEX}" -o "${OUTPUT_DIR}/${SAMPLE}" "${READ1}" "${READ2}" done # Create the R script file for tximport and DESeq2 analysis cat << 'EOF' > run_deseq2.R # Install R packages (if not already installed) # if (!requireNamespace("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # BiocManager::install("tximport") # BiocManager::install("DESeq2") # BiocManager::install("AnnotationDbi") # For gene mapping # BiocManager::install("org.Hs.eg.db") # For human gene mapping library(tximport) library(DESeq2) # library(AnnotationDbi) # library(org.Hs.eg.db) # Example for human # Define paths to kallisto output directories # These should match the output from the kallisto quant step kallisto_output_dir <- "kallisto_quant_results" sample_names <- c("sample_1", "sample_2", "sample_3", "sample_4") # Must match samples used in kallisto quant files <- file.path(kallisto_output_dir, sample_names, "abundance.h5") names(files) <- sample_names # Create a sample information table (design matrix) # Replace with your actual experimental design (e.g., conditions, batches) # Example: 2 conditions, 2 replicates each sample_info <- data.frame( sample = sample_names, condition = factor(c("control", "control", "treated", "treated")), replicate = factor(c("rep1", "rep2", "rep1", "rep2")) ) rownames(sample_info) <- sample_names # Optional: Create a transcript-to-gene mapping table if performing gene-level DGE # This step requires a GTF/GFF file used for the transcriptome or an annotation package. # Example for human using Gencode v44 and org.Hs.eg.db: # txdb <- makeTxDbFromGFF("gencode.v44.annotation.gtf.gz") # Requires GenomicFeatures package # k <- keys(txdb, keytype = "TXNAME") # tx2gene <- select(txdb, k, "GENEID", "TXNAME") # txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = TRUE) # Import kallisto abundance data using tximport # Use txOut=TRUE for transcript-level analysis, or tx2gene for gene-level analysis txi <- tximport(files, type = "kallisto", txOut = TRUE) # Example: transcript-level analysis # Create DESeq2 object dds <- DESeqDataSetFromTximport(txi, colData = sample_info, design = ~ condition) # Pre-filtering (optional, but recommended for DESeq2) # Remove genes/transcripts with very low counts across all samples keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,] # Run DESeq2 analysis dds <- DESeq(dds) # Get results res <- results(dds) summary(res) # Order results by adjusted p-value res_ordered <- res[order(res$padj),] # Save results write.csv(as.data.frame(res_ordered), file = "deseq2_results.csv") # Optional: Generate an MA plot # png("deseq2_MA_plot.png") # plotMA(res, main="DESeq2 MA-plot") # dev.off() EOF # Execute the R script for differential gene expression analysis Rscript run_deseq2.R
Raw Source Text
Fastq files were generated and demultiplexed with the bcl2fastq v2.17.1.14 Conversion Software (Illumina) Sequenced reads were trimmed for adaptor sequence and transcript abundances were computed using kallisto v0.46.0 kallisto index was generated using Gencode annotation, GRCm38_vM19 Differential gene expression analysis was performed with the kallisto abundance tables using tximport and DESeq2 Genome_build: mm10 Supplementary_files_format_and_content: Table of TPM values of annotated transcripts and DESeq2 output of differential gene expression