GSE135012 Processing Pipeline — Yeo Lab Publications

Publication

An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.

Nature cancer (2020) — PMID 34109316

Dataset

GSE135012

Stau2 knockdown in human bcCML cells (K562)"

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1

Kallisto â transcript quantification

kallisto v0.46.1 GitHub

$ Bash example

# Install kallisto using conda
# conda create -n kallisto_env -c bioconda kallisto=0.46.1
# conda activate kallisto_env

# Example: Build kallisto index (if not already built)
# Replace 'transcripts.fasta.gz' with your reference transcriptome FASTA file (e.g., from Ensembl or GENCODE)
# kallisto index -i human_GRCh38_transcriptome.idx transcripts.fasta.gz

# Perform transcript quantification
# -i: Path to the kallisto index file (e.g., human_GRCh38_transcriptome.idx)
# -o: Output directory for quantification results
# -b: Number of bootstrap samples (e.g., 100 for robust estimates)
# -t: Number of threads to use
# Input FASTQ files (can be gzipped, space-separated for paired-end reads)
kallisto quant -i human_GRCh38_transcriptome.idx -o kallisto_quant_output -b 100 -t 8 sample_R1.fastq.gz sample_R2.fastq.gz

View on GitHub

2

Sleuth â gene differential expression analysis/gene normalized abundance measurements

Sleuth vNot specified GitHub

$ Bash example

# --- Installation (commented out) ---
# # Install R if not already installed
# # sudo apt-get update && sudo apt-get install r-base
#
# # Install BiocManager and sleuth package in R
# # R -e 'install.packages("BiocManager")'
# # R -e 'BiocManager::install("sleuth")'

# --- Prepare input data (example placeholders) ---
# Assuming Kallisto quantification has already been performed for multiple samples.
# Example Kallisto output directories:
mkdir -p kallisto_output/sample1_condA kallisto_output/sample2_condA kallisto_output/sample3_condB kallisto_output/sample4_condB

# Create dummy abundance.h5 files for demonstration (Sleuth requires these)
# In a real scenario, these would be generated by Kallisto.
touch kallisto_output/sample1_condA/abundance.h5
touch kallisto_output/sample2_condA/abundance.h5
touch kallisto_output/sample3_condB/abundance.h5
touch kallisto_output/sample4_condB/abundance.h5

# Create a sample information file (s2c table)
cat << EOF > sample_info.tsv
sample\tcondition\tpath
sample1_condA\tcondA\tkallisto_output/sample1_condA
sample2_condA\tcondA\tkallisto_output/sample2_condA
sample3_condB\tcondB\tkallisto_output/sample3_condB
sample4_condB\tcondB\tkallisto_output/sample4_condB
EOF

# --- Sleuth R script ---
# This script performs differential expression analysis using Sleuth.
# It reads Kallisto output and a sample information table.
cat << 'EOF_R_SCRIPT' > run_sleuth.R
library(sleuth)

# Read sample information table
s2c <- read.table("sample_info.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE)

# Ensure paths are absolute or relative to the current working directory
# For this example, paths are relative to where the script is run.
# s2c$path <- file.path(getwd(), s2c$path) # Uncomment if paths in s2c are relative to a different base

# Create a sleuth object
# 'extra_bootstrap_data = TRUE' and 'read_bootstrap_tpm = TRUE' are often used for full Sleuth functionality
so <- sleuth_prep(s2c, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE)

# Define models for differential expression
# Full model: accounts for the 'condition' variable
so <- sleuth_fit(so, ~condition, 'full')
# Reduced model: null model (no condition effect)
so <- sleuth_fit(so, ~1, 'reduced')

# Perform likelihood ratio test (LRT) to compare models
so <- sleuth_lrt(so, 'reduced', 'full')

# Extract results for the LRT
results_table <- sleuth_results(so, 'reduced:full', test_type = 'lrt', show_all = FALSE)

# Filter and save significant results (e.g., q-value <= 0.05)
significant_results <- subset(results_table, qval <= 0.05)
write.table(significant_results, "sleuth_differential_expression_results.tsv", sep = "\t", quote = FALSE, row.names = FALSE)

# Optional: Save normalized abundance measurements (e.g., aggregated gene-level TPMs)
# This requires a transcript-to-gene mapping (tx2gene) which is not provided in this generic example.
# If tx2gene was available:
# so <- sleuth_prep(s2c, tx2gene = tx2gene_df, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE)
# gene_tpm_matrix <- sleuth_to_matrix(so, 'tpm', 'ext_gene')
# write.table(gene_tpm_matrix, "sleuth_gene_normalized_abundance_tpm.tsv", sep = "\t", quote = FALSE, row.names = TRUE)

message("Sleuth analysis complete. Results saved to sleuth_differential_expression_results.tsv")
EOF_R_SCRIPT

# --- Execute Sleuth analysis ---
Rscript run_sleuth.R

View on GitHub