GSE100943 Processing Pipeline
Publication
Elimination of Toxic Microsatellite Repeat Expansion RNA by RNA-Targeting Cas9.Cell (2017) — PMID 28803727
Dataset
GSE100943Microsatellite expansion RNA visualization, elimination, and reversal of molecular pathology by RNA-targeting Cas9
Processing Steps
Generate Jupyter Notebook-
1
RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below.
$ Bash example
olego -h
-
2
Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing.
$ Bash example
# Quantas is described as a MATLAB-based tool. # The exact command-line execution depends on how the MATLAB scripts are wrapped or called. # This is a placeholder command assuming a hypothetical command-line interface or a shell wrapper. # Define input and output files INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file GENOME_ANNOTATION="GRCh38.gtf" # Placeholder for human genome annotation (e.g., from GENCODE) OUTPUT_DIR="quantas_results" # Create output directory mkdir -p "${OUTPUT_DIR}" # Placeholder for the actual Quantas execution command # This command is illustrative and assumes a command-line interface for Quantas. # Replace with actual Quantas command if available. quantas_estimate_as \ --input_bam "${INPUT_BAM}" \ --genome_annotation "${GENOME_ANNOTATION}" \ --output_file "${OUTPUT_DIR}/alternative_splicing_events.tsv" \ --log_file "${OUTPUT_DIR}/quantas.log" -
3
Olego aligned alignment files were used to count observed junction reads for each exon.
$ Bash example
# Install DEXSeq (R package) and its Python scripts if not already installed # conda install -c bioconda r-dexseq # The python scripts (dexseq_prepare_annotation.py, dexseq_count.py) are usually installed # in the conda environment's bin directory or can be found in the R package source. # Define variables BAM_FILE="sample.bam" # Replace with actual Olego aligned BAM file GTF_FILE="gencode.v44.annotation.gtf" # Latest GRCh38 human annotation (placeholder) DEXSEQ_GFF="gencode.v44.dexseq.gff" OUTPUT_FILE="sample_dexseq_exon_counts.txt" # Download GTF if not available (example for human GRCh38) # wget -P . https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz # gunzip gencode.v44.annotation.gtf.gz # Step 1: Prepare the annotation file for DEXSeq # This script converts a standard GTF/GFF to a DEXSeq-compatible GFF, # defining "exonic parts" and assigning unique IDs for exon-level counting. # Ensure 'dexseq_prepare_annotation.py' is in your PATH. dexseq_prepare_annotation.py "${GTF_FILE}" "${DEXSEQ_GFF}" # Step 2: Sort the BAM file by read name (required for paired-end counting with dexseq_count.py) # conda install -c bioconda samtools samtools sort -n "${BAM_FILE}" -o "${BAM_FILE%.bam}.nsorted.bam" # Step 3: Count reads per exon using dexseq_count.py # -p yes: Input reads are paired-end (use 'no' for single-end) # -s no: Strandedness (yes: forward, reverse: reverse, no: unstranded). Adjust based on library prep. # 'no' is a safe default if not specified. # -f bam: Input file format is BAM # Ensure 'dexseq_count.py' is in your PATH. dexseq_count.py -p yes -s no -f bam "${DEXSEQ_GFF}" "${BAM_FILE%.bam}.nsorted.bam" "${OUTPUT_FILE}" -
4
Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform.
$ Bash example
# Install skipper (if not already installed) # pip install skipper # Example usage of skipper for alternative splicing quantification. # This tool quantifies alternative splicing events by counting exon and exon-junction fragments # uniquely supporting inclusion or skipping isoforms and assigns a probability score. # Placeholder for input BAM file (e.g., aligned reads from STAR or HISAT2) INPUT_BAM="aligned_reads.bam" # Placeholder for GTF annotation file (e.g., Gencode human release 44) # Download from: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz GTF_FILE="gencode.v44.annotation.gtf" # Output file for splicing quantification results OUTPUT_FILE="splicing_quantification.tsv" # Run skipper to quantify alternative splicing events skipper --bam_file "${INPUT_BAM}" --gtf_file "${GTF_FILE}" --output_file "${OUTPUT_FILE}" -
5
A Fisherâs exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR).
$ Bash example
# Install rMATS-turbo (e.g., via conda) # conda create -n rmats_env python=3.8 # conda activate rmats_env # pip install rmats-turbo # Define input BAM files (replace with actual paths) # Assuming two conditions: 'control' and 'treatment' with replicates # Create a file listing BAMs for condition 1 (e.g., control replicates) echo "/path/to/control_rep1.bam" > control_bams.txt echo "/path/to/control_rep2.bam" >> control_bams.txt # Create a file listing BAMs for condition 2 (e.g., treatment replicates) echo "/path/to/treatment_rep1.bam" > treatment_bams.txt echo "/path/to/treatment_rep2.bam" >> treatment_bams.txt # Define reference genome and annotation (replace with actual paths/versions) # Using hg38 as a placeholder for human genome assembly GENOME_GTF="/path/to/Homo_sapiens.GRCh38.109.gtf" # Example GTF for hg38, download from Ensembl or Gencode # Define output and temporary directories OUTPUT_DIR="rmats_output" TMP_DIR="rmats_tmp" mkdir -p "$OUTPUT_DIR" "$TMP_DIR" # Run rMATS-turbo for alternative splicing analysis. # rMATS quantifies splicing events using both exon and exon-junction fragments # and calculates statistical significance (p-values) and False Discovery Rate (FDR) # using Benjamini-Hochberg correction, which aligns with the description. # Note: While the description mentions "Fisher’s exact test", rMATS uses a more complex # statistical model (likelihood ratio test based on beta-binomial distribution) to evaluate # differential splicing, but it provides the p-values and FDRs as described. rmats.py \ --b1 control_bams.txt \ --b2 treatment_bams.txt \ --gtf "$GENOME_GTF" \ --od "$OUTPUT_DIR" \ --tmp "$TMP_DIR" \ -t paired \ --readLength 100 \ --nthread 8 \ --libType fr-firststrand \ --task as -
6
In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI).
$ Bash example
# Install rMATS (example using conda) # conda create -n rmats_env python=3.8 # conda activate rmats_env # conda install -c bioconda rmats-turbo # Example usage of rMATS for calculating differential exon inclusion (dI/dPSI) # This assumes you have aligned BAM files for two conditions (e.g., control and treatment) # and a genome annotation GTF file. # Define input BAM files for two conditions # Replace with actual paths to your BAM files BAM_FILES_CONDITION1="path/to/control_rep1.bam,path/to/control_rep2.bam" BAM_FILES_CONDITION2="path/to/treatment_rep1.bam,path/to/treatment_rep2.bam" # Define genome annotation GTF file # Using a placeholder for human hg38. Replace with your specific GTF path. GENOME_GTF="path/to/Homo_sapiens.GRCh38.109.gtf" # Example: Ensembl GTF # Define output directory OUTPUT_DIR="rmats_output_dI_calculation" mkdir -p "${OUTPUT_DIR}" # Define temporary directory TMP_DIR="rmats_tmp" mkdir -p "${TMP_DIR}" # Define read length (e.g., 50bp) READ_LENGTH=50 # Define number of threads NUM_THREADS=8 # Define library type (e.g., fr-firststrand for dUTP/directional RNA-seq) # Common options: fr-unstranded, fr-firststrand, fr-secondstrand LIBRARY_TYPE="fr-firststrand" # Run rMATS to calculate differential splicing events, including exon inclusion (SE) # The output will include 'SE.MATS.JC.txt' and 'SE.MATS.JunctionCountOnly.txt' # which contain PSI values and dPSI (dI) for skipped exons. rmats.py \ --b1 "${BAM_FILES_CONDITION1}" \ --b2 "${BAM_FILES_CONDITION2}" \ --gtf "${GENOME_GTF}" \ --od "${OUTPUT_DIR}" \ --tmp "${TMP_DIR}" \ -t paired \ --readLength "${READ_LENGTH}" \ --nthread "${NUM_THREADS}" \ --libType "${LIBRARY_TYPE}" -
7
See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation.
Quantas vv1.0$ Bash example
# Quantas is a tool for quantifying alternative splicing from RNA-seq data. # Installation instructions (assuming a Linux environment): # Download Quantas v1.0 # wget http://zhanglab.c2b2.columbia.edu/downloads/quantas_v1.0.tar.gz # tar -xzf quantas_v1.0.tar.gz # cd quantas_v1.0 # make # export PATH=$(pwd):$PATH # Add Quantas to your PATH # Example usage: # Replace 'Homo_sapiens.GRCh38.109.gtf' with your actual GTF annotation file. # Replace 'sample.bam' with your actual RNA-seq alignment BAM file. # Ensure the BAM file is sorted and indexed. # Create an output directory mkdir -p quantas_output # Run Quantas quantas -a Homo_sapiens.GRCh38.109.gtf -r sample.bam -o quantas_output
Tools Used
Raw Source Text
RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below. Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing. Olego aligned alignment files were used to count observed junction reads for each exon. Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform. A Fisherâs exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR). In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI). See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation. Genome_build: hg19 Supplementary_files_format_and_content: RPKM