GSE100943 Processing Pipeline

RNA-Seq code_examples 7 steps

Publication

Elimination of Toxic Microsatellite Repeat Expansion RNA by RNA-Targeting Cas9.

Cell (2017) — PMID 28803727

Dataset

GSE100943

Microsatellite expansion RNA visualization, elimination, and reversal of molecular pathology by RNA-targeting Cas9

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below.

    $ Bash example
    olego -h
  2. 2

    Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing.

    Quantas v2012 (Inferred from publication date) GitHub
    $ Bash example
    # Quantas is described as a MATLAB-based tool.
    # The exact command-line execution depends on how the MATLAB scripts are wrapped or called.
    # This is a placeholder command assuming a hypothetical command-line interface or a shell wrapper.
    
    # Define input and output files
    INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file
    GENOME_ANNOTATION="GRCh38.gtf" # Placeholder for human genome annotation (e.g., from GENCODE)
    OUTPUT_DIR="quantas_results"
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Placeholder for the actual Quantas execution command
    # This command is illustrative and assumes a command-line interface for Quantas.
    # Replace with actual Quantas command if available.
    quantas_estimate_as \
        --input_bam "${INPUT_BAM}" \
        --genome_annotation "${GENOME_ANNOTATION}" \
        --output_file "${OUTPUT_DIR}/alternative_splicing_events.tsv" \
        --log_file "${OUTPUT_DIR}/quantas.log"
  3. 3

    Olego aligned alignment files were used to count observed junction reads for each exon.

    dexseq_count.py (Inferred with models/gemini-2.5-flash) v1.40.0 GitHub
    $ Bash example
    # Install DEXSeq (R package) and its Python scripts if not already installed
    # conda install -c bioconda r-dexseq
    # The python scripts (dexseq_prepare_annotation.py, dexseq_count.py) are usually installed
    # in the conda environment's bin directory or can be found in the R package source.
    
    # Define variables
    BAM_FILE="sample.bam" # Replace with actual Olego aligned BAM file
    GTF_FILE="gencode.v44.annotation.gtf" # Latest GRCh38 human annotation (placeholder)
    DEXSEQ_GFF="gencode.v44.dexseq.gff"
    OUTPUT_FILE="sample_dexseq_exon_counts.txt"
    
    # Download GTF if not available (example for human GRCh38)
    # wget -P . https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
    # gunzip gencode.v44.annotation.gtf.gz
    
    # Step 1: Prepare the annotation file for DEXSeq
    # This script converts a standard GTF/GFF to a DEXSeq-compatible GFF,
    # defining "exonic parts" and assigning unique IDs for exon-level counting.
    # Ensure 'dexseq_prepare_annotation.py' is in your PATH.
    dexseq_prepare_annotation.py "${GTF_FILE}" "${DEXSEQ_GFF}"
    
    # Step 2: Sort the BAM file by read name (required for paired-end counting with dexseq_count.py)
    # conda install -c bioconda samtools
    samtools sort -n "${BAM_FILE}" -o "${BAM_FILE%.bam}.nsorted.bam"
    
    # Step 3: Count reads per exon using dexseq_count.py
    # -p yes: Input reads are paired-end (use 'no' for single-end)
    # -s no: Strandedness (yes: forward, reverse: reverse, no: unstranded). Adjust based on library prep.
    #        'no' is a safe default if not specified.
    # -f bam: Input file format is BAM
    # Ensure 'dexseq_count.py' is in your PATH.
    dexseq_count.py -p yes -s no -f bam "${DEXSEQ_GFF}" "${BAM_FILE%.bam}.nsorted.bam" "${OUTPUT_FILE}"
    
  4. 4

    Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform.

    skipper (Inferred with models/gemini-2.5-flash) v0.1.0 GitHub
    $ Bash example
    # Install skipper (if not already installed)
    # pip install skipper
    
    # Example usage of skipper for alternative splicing quantification.
    # This tool quantifies alternative splicing events by counting exon and exon-junction fragments
    # uniquely supporting inclusion or skipping isoforms and assigns a probability score.
    
    # Placeholder for input BAM file (e.g., aligned reads from STAR or HISAT2)
    INPUT_BAM="aligned_reads.bam"
    
    # Placeholder for GTF annotation file (e.g., Gencode human release 44)
    # Download from: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
    GTF_FILE="gencode.v44.annotation.gtf"
    
    # Output file for splicing quantification results
    OUTPUT_FILE="splicing_quantification.tsv"
    
    # Run skipper to quantify alternative splicing events
    skipper --bam_file "${INPUT_BAM}" --gtf_file "${GTF_FILE}" --output_file "${OUTPUT_FILE}"
  5. 5

    A Fisher’s exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR).

    rMATS (Inferred with models/gemini-2.5-flash) v4.1.1 GitHub
    $ Bash example
    # Install rMATS-turbo (e.g., via conda)
    # conda create -n rmats_env python=3.8
    # conda activate rmats_env
    # pip install rmats-turbo
    
    # Define input BAM files (replace with actual paths)
    # Assuming two conditions: 'control' and 'treatment' with replicates
    # Create a file listing BAMs for condition 1 (e.g., control replicates)
    echo "/path/to/control_rep1.bam" > control_bams.txt
    echo "/path/to/control_rep2.bam" >> control_bams.txt
    
    # Create a file listing BAMs for condition 2 (e.g., treatment replicates)
    echo "/path/to/treatment_rep1.bam" > treatment_bams.txt
    echo "/path/to/treatment_rep2.bam" >> treatment_bams.txt
    
    # Define reference genome and annotation (replace with actual paths/versions)
    # Using hg38 as a placeholder for human genome assembly
    GENOME_GTF="/path/to/Homo_sapiens.GRCh38.109.gtf" # Example GTF for hg38, download from Ensembl or Gencode
    
    # Define output and temporary directories
    OUTPUT_DIR="rmats_output"
    TMP_DIR="rmats_tmp"
    mkdir -p "$OUTPUT_DIR" "$TMP_DIR"
    
    # Run rMATS-turbo for alternative splicing analysis.
    # rMATS quantifies splicing events using both exon and exon-junction fragments
    # and calculates statistical significance (p-values) and False Discovery Rate (FDR)
    # using Benjamini-Hochberg correction, which aligns with the description.
    # Note: While the description mentions "Fisher’s exact test", rMATS uses a more complex
    # statistical model (likelihood ratio test based on beta-binomial distribution) to evaluate
    # differential splicing, but it provides the p-values and FDRs as described.
    rmats.py \
        --b1 control_bams.txt \
        --b2 treatment_bams.txt \
        --gtf "$GENOME_GTF" \
        --od "$OUTPUT_DIR" \
        --tmp "$TMP_DIR" \
        -t paired \
        --readLength 100 \
        --nthread 8 \
        --libType fr-firststrand \
        --task as
  6. 6

    In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI).

    rMATS (Inferred with models/gemini-2.5-flash) v4.1.2 GitHub
    $ Bash example
    # Install rMATS (example using conda)
    # conda create -n rmats_env python=3.8
    # conda activate rmats_env
    # conda install -c bioconda rmats-turbo
    
    # Example usage of rMATS for calculating differential exon inclusion (dI/dPSI)
    # This assumes you have aligned BAM files for two conditions (e.g., control and treatment)
    # and a genome annotation GTF file.
    
    # Define input BAM files for two conditions
    # Replace with actual paths to your BAM files
    BAM_FILES_CONDITION1="path/to/control_rep1.bam,path/to/control_rep2.bam"
    BAM_FILES_CONDITION2="path/to/treatment_rep1.bam,path/to/treatment_rep2.bam"
    
    # Define genome annotation GTF file
    # Using a placeholder for human hg38. Replace with your specific GTF path.
    GENOME_GTF="path/to/Homo_sapiens.GRCh38.109.gtf" # Example: Ensembl GTF
    
    # Define output directory
    OUTPUT_DIR="rmats_output_dI_calculation"
    mkdir -p "${OUTPUT_DIR}"
    
    # Define temporary directory
    TMP_DIR="rmats_tmp"
    mkdir -p "${TMP_DIR}"
    
    # Define read length (e.g., 50bp)
    READ_LENGTH=50
    
    # Define number of threads
    NUM_THREADS=8
    
    # Define library type (e.g., fr-firststrand for dUTP/directional RNA-seq)
    # Common options: fr-unstranded, fr-firststrand, fr-secondstrand
    LIBRARY_TYPE="fr-firststrand"
    
    # Run rMATS to calculate differential splicing events, including exon inclusion (SE)
    # The output will include 'SE.MATS.JC.txt' and 'SE.MATS.JunctionCountOnly.txt'
    # which contain PSI values and dPSI (dI) for skipped exons.
    rmats.py \
        --b1 "${BAM_FILES_CONDITION1}" \
        --b2 "${BAM_FILES_CONDITION2}" \
        --gtf "${GENOME_GTF}" \
        --od "${OUTPUT_DIR}" \
        --tmp "${TMP_DIR}" \
        -t paired \
        --readLength "${READ_LENGTH}" \
        --nthread "${NUM_THREADS}" \
        --libType "${LIBRARY_TYPE}"
  7. 7

    See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation.

    Quantas vv1.0
    $ Bash example
    # Quantas is a tool for quantifying alternative splicing from RNA-seq data.
    # Installation instructions (assuming a Linux environment):
    # Download Quantas v1.0
    # wget http://zhanglab.c2b2.columbia.edu/downloads/quantas_v1.0.tar.gz
    # tar -xzf quantas_v1.0.tar.gz
    # cd quantas_v1.0
    # make
    # export PATH=$(pwd):$PATH # Add Quantas to your PATH
    
    # Example usage:
    # Replace 'Homo_sapiens.GRCh38.109.gtf' with your actual GTF annotation file.
    # Replace 'sample.bam' with your actual RNA-seq alignment BAM file.
    # Ensure the BAM file is sorted and indexed.
    
    # Create an output directory
    mkdir -p quantas_output
    
    # Run Quantas
    quantas -a Homo_sapiens.GRCh38.109.gtf -r sample.bam -o quantas_output

Tools Used

Raw Source Text
RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below.
Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing. Olego aligned alignment files were used to count observed junction reads for each exon. Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform. A Fisher’s exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR). In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI). See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation.
Genome_build: hg19
Supplementary_files_format_and_content: RPKM
← Back to Analysis