GSE135012 Processing Pipeline

RNA-Seq code_examples 2 steps

Publication

An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.

Nature cancer (2020) — PMID 34109316

Dataset

GSE135012

Stau2 knockdown in human bcCML cells (K562)"

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    Kallisto – transcript quantification

    kallisto v0.46.1 GitHub
    $ Bash example
    # Install kallisto using conda
    # conda create -n kallisto_env -c bioconda kallisto=0.46.1
    # conda activate kallisto_env
    
    # Example: Build kallisto index (if not already built)
    # Replace 'transcripts.fasta.gz' with your reference transcriptome FASTA file (e.g., from Ensembl or GENCODE)
    # kallisto index -i human_GRCh38_transcriptome.idx transcripts.fasta.gz
    
    # Perform transcript quantification
    # -i: Path to the kallisto index file (e.g., human_GRCh38_transcriptome.idx)
    # -o: Output directory for quantification results
    # -b: Number of bootstrap samples (e.g., 100 for robust estimates)
    # -t: Number of threads to use
    # Input FASTQ files (can be gzipped, space-separated for paired-end reads)
    kallisto quant -i human_GRCh38_transcriptome.idx -o kallisto_quant_output -b 100 -t 8 sample_R1.fastq.gz sample_R2.fastq.gz
  2. 2

    Sleuth – gene differential expression analysis/gene normalized abundance measurements

    Sleuth vNot specified GitHub
    $ Bash example
    # --- Installation (commented out) ---
    # # Install R if not already installed
    # # sudo apt-get update && sudo apt-get install r-base
    #
    # # Install BiocManager and sleuth package in R
    # # R -e 'install.packages("BiocManager")'
    # # R -e 'BiocManager::install("sleuth")'
    
    # --- Prepare input data (example placeholders) ---
    # Assuming Kallisto quantification has already been performed for multiple samples.
    # Example Kallisto output directories:
    mkdir -p kallisto_output/sample1_condA kallisto_output/sample2_condA kallisto_output/sample3_condB kallisto_output/sample4_condB
    
    # Create dummy abundance.h5 files for demonstration (Sleuth requires these)
    # In a real scenario, these would be generated by Kallisto.
    touch kallisto_output/sample1_condA/abundance.h5
    touch kallisto_output/sample2_condA/abundance.h5
    touch kallisto_output/sample3_condB/abundance.h5
    touch kallisto_output/sample4_condB/abundance.h5
    
    # Create a sample information file (s2c table)
    cat << EOF > sample_info.tsv
    sample\tcondition\tpath
    sample1_condA\tcondA\tkallisto_output/sample1_condA
    sample2_condA\tcondA\tkallisto_output/sample2_condA
    sample3_condB\tcondB\tkallisto_output/sample3_condB
    sample4_condB\tcondB\tkallisto_output/sample4_condB
    EOF
    
    # --- Sleuth R script ---
    # This script performs differential expression analysis using Sleuth.
    # It reads Kallisto output and a sample information table.
    cat << 'EOF_R_SCRIPT' > run_sleuth.R
    library(sleuth)
    
    # Read sample information table
    s2c <- read.table("sample_info.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
    
    # Ensure paths are absolute or relative to the current working directory
    # For this example, paths are relative to where the script is run.
    # s2c$path <- file.path(getwd(), s2c$path) # Uncomment if paths in s2c are relative to a different base
    
    # Create a sleuth object
    # 'extra_bootstrap_data = TRUE' and 'read_bootstrap_tpm = TRUE' are often used for full Sleuth functionality
    so <- sleuth_prep(s2c, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE)
    
    # Define models for differential expression
    # Full model: accounts for the 'condition' variable
    so <- sleuth_fit(so, ~condition, 'full')
    # Reduced model: null model (no condition effect)
    so <- sleuth_fit(so, ~1, 'reduced')
    
    # Perform likelihood ratio test (LRT) to compare models
    so <- sleuth_lrt(so, 'reduced', 'full')
    
    # Extract results for the LRT
    results_table <- sleuth_results(so, 'reduced:full', test_type = 'lrt', show_all = FALSE)
    
    # Filter and save significant results (e.g., q-value <= 0.05)
    significant_results <- subset(results_table, qval <= 0.05)
    write.table(significant_results, "sleuth_differential_expression_results.tsv", sep = "\t", quote = FALSE, row.names = FALSE)
    
    # Optional: Save normalized abundance measurements (e.g., aggregated gene-level TPMs)
    # This requires a transcript-to-gene mapping (tx2gene) which is not provided in this generic example.
    # If tx2gene was available:
    # so <- sleuth_prep(s2c, tx2gene = tx2gene_df, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE)
    # gene_tpm_matrix <- sleuth_to_matrix(so, 'tpm', 'ext_gene')
    # write.table(gene_tpm_matrix, "sleuth_gene_normalized_abundance_tpm.tsv", sep = "\t", quote = FALSE, row.names = TRUE)
    
    message("Sleuth analysis complete. Results saved to sleuth_differential_expression_results.tsv")
    EOF_R_SCRIPT
    
    # --- Execute Sleuth analysis ---
    Rscript run_sleuth.R
Raw Source Text
Kallisto – transcript quantification
Sleuth – gene differential expression analysis/gene normalized abundance measurements
Genome_build: hg38
Supplementary_files_format_and_content: .xls, TPM values and sleuth output
← Back to Analysis