GSE135012 Processing Pipeline
RNA-Seq
code_examples
2 steps
Publication
An in vivo genome-wide CRISPR screen identifies the RNA-binding protein Staufen2 as a key regulator of myeloid leukemia.Nature cancer (2020) — PMID 34109316
Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
Processing Steps
Generate Jupyter Notebook-
1
Kallisto â transcript quantification
$ Bash example
# Install kallisto using conda # conda create -n kallisto_env -c bioconda kallisto=0.46.1 # conda activate kallisto_env # Example: Build kallisto index (if not already built) # Replace 'transcripts.fasta.gz' with your reference transcriptome FASTA file (e.g., from Ensembl or GENCODE) # kallisto index -i human_GRCh38_transcriptome.idx transcripts.fasta.gz # Perform transcript quantification # -i: Path to the kallisto index file (e.g., human_GRCh38_transcriptome.idx) # -o: Output directory for quantification results # -b: Number of bootstrap samples (e.g., 100 for robust estimates) # -t: Number of threads to use # Input FASTQ files (can be gzipped, space-separated for paired-end reads) kallisto quant -i human_GRCh38_transcriptome.idx -o kallisto_quant_output -b 100 -t 8 sample_R1.fastq.gz sample_R2.fastq.gz
-
2
Sleuth â gene differential expression analysis/gene normalized abundance measurements
$ Bash example
# --- Installation (commented out) --- # # Install R if not already installed # # sudo apt-get update && sudo apt-get install r-base # # # Install BiocManager and sleuth package in R # # R -e 'install.packages("BiocManager")' # # R -e 'BiocManager::install("sleuth")' # --- Prepare input data (example placeholders) --- # Assuming Kallisto quantification has already been performed for multiple samples. # Example Kallisto output directories: mkdir -p kallisto_output/sample1_condA kallisto_output/sample2_condA kallisto_output/sample3_condB kallisto_output/sample4_condB # Create dummy abundance.h5 files for demonstration (Sleuth requires these) # In a real scenario, these would be generated by Kallisto. touch kallisto_output/sample1_condA/abundance.h5 touch kallisto_output/sample2_condA/abundance.h5 touch kallisto_output/sample3_condB/abundance.h5 touch kallisto_output/sample4_condB/abundance.h5 # Create a sample information file (s2c table) cat << EOF > sample_info.tsv sample\tcondition\tpath sample1_condA\tcondA\tkallisto_output/sample1_condA sample2_condA\tcondA\tkallisto_output/sample2_condA sample3_condB\tcondB\tkallisto_output/sample3_condB sample4_condB\tcondB\tkallisto_output/sample4_condB EOF # --- Sleuth R script --- # This script performs differential expression analysis using Sleuth. # It reads Kallisto output and a sample information table. cat << 'EOF_R_SCRIPT' > run_sleuth.R library(sleuth) # Read sample information table s2c <- read.table("sample_info.tsv", header = TRUE, sep = "\t", stringsAsFactors = FALSE) # Ensure paths are absolute or relative to the current working directory # For this example, paths are relative to where the script is run. # s2c$path <- file.path(getwd(), s2c$path) # Uncomment if paths in s2c are relative to a different base # Create a sleuth object # 'extra_bootstrap_data = TRUE' and 'read_bootstrap_tpm = TRUE' are often used for full Sleuth functionality so <- sleuth_prep(s2c, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE) # Define models for differential expression # Full model: accounts for the 'condition' variable so <- sleuth_fit(so, ~condition, 'full') # Reduced model: null model (no condition effect) so <- sleuth_fit(so, ~1, 'reduced') # Perform likelihood ratio test (LRT) to compare models so <- sleuth_lrt(so, 'reduced', 'full') # Extract results for the LRT results_table <- sleuth_results(so, 'reduced:full', test_type = 'lrt', show_all = FALSE) # Filter and save significant results (e.g., q-value <= 0.05) significant_results <- subset(results_table, qval <= 0.05) write.table(significant_results, "sleuth_differential_expression_results.tsv", sep = "\t", quote = FALSE, row.names = FALSE) # Optional: Save normalized abundance measurements (e.g., aggregated gene-level TPMs) # This requires a transcript-to-gene mapping (tx2gene) which is not provided in this generic example. # If tx2gene was available: # so <- sleuth_prep(s2c, tx2gene = tx2gene_df, extra_bootstrap_data = TRUE, read_bootstrap_tpm = TRUE) # gene_tpm_matrix <- sleuth_to_matrix(so, 'tpm', 'ext_gene') # write.table(gene_tpm_matrix, "sleuth_gene_normalized_abundance_tpm.tsv", sep = "\t", quote = FALSE, row.names = TRUE) message("Sleuth analysis complete. Results saved to sleuth_differential_expression_results.tsv") EOF_R_SCRIPT # --- Execute Sleuth analysis --- Rscript run_sleuth.R
Raw Source Text
Kallisto â transcript quantification Sleuth â gene differential expression analysis/gene normalized abundance measurements Genome_build: hg38 Supplementary_files_format_and_content: .xls, TPM values and sleuth output