GSE100943 Processing Pipeline

RNA-Seq code_examples 7 steps

Publication

Elimination of Toxic Microsatellite Repeat Expansion RNA by RNA-Targeting Cas9.

Cell (2017) — PMID 28803727

Dataset

Microsatellite expansion RNA visualization, elimination, and reversal of molecular pathology by RNA-targeting Cas9

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

1
RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below.

RNA-seq
$ Bash example
```
olego -h
```

Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing.

Quantas v2012 (Inferred from publication date) GitHub

$ Bash example

# Quantas is described as a MATLAB-based tool.
# The exact command-line execution depends on how the MATLAB scripts are wrapped or called.
# This is a placeholder command assuming a hypothetical command-line interface or a shell wrapper.

# Define input and output files
INPUT_BAM="aligned_reads.bam" # Placeholder for input BAM file
GENOME_ANNOTATION="GRCh38.gtf" # Placeholder for human genome annotation (e.g., from GENCODE)
OUTPUT_DIR="quantas_results"

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Placeholder for the actual Quantas execution command
# This command is illustrative and assumes a command-line interface for Quantas.
# Replace with actual Quantas command if available.
quantas_estimate_as \
    --input_bam "${INPUT_BAM}" \
    --genome_annotation "${GENOME_ANNOTATION}" \
    --output_file "${OUTPUT_DIR}/alternative_splicing_events.tsv" \
    --log_file "${OUTPUT_DIR}/quantas.log"

View on GitHub

Olego aligned alignment files were used to count observed junction reads for each exon.

dexseq_count.py (Inferred with models/gemini-2.5-flash) v1.40.0 GitHub

$ Bash example

# Install DEXSeq (R package) and its Python scripts if not already installed
# conda install -c bioconda r-dexseq
# The python scripts (dexseq_prepare_annotation.py, dexseq_count.py) are usually installed
# in the conda environment's bin directory or can be found in the R package source.

# Define variables
BAM_FILE="sample.bam" # Replace with actual Olego aligned BAM file
GTF_FILE="gencode.v44.annotation.gtf" # Latest GRCh38 human annotation (placeholder)
DEXSEQ_GFF="gencode.v44.dexseq.gff"
OUTPUT_FILE="sample_dexseq_exon_counts.txt"

# Download GTF if not available (example for human GRCh38)
# wget -P . https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
# gunzip gencode.v44.annotation.gtf.gz

# Step 1: Prepare the annotation file for DEXSeq
# This script converts a standard GTF/GFF to a DEXSeq-compatible GFF,
# defining "exonic parts" and assigning unique IDs for exon-level counting.
# Ensure 'dexseq_prepare_annotation.py' is in your PATH.
dexseq_prepare_annotation.py "${GTF_FILE}" "${DEXSEQ_GFF}"

# Step 2: Sort the BAM file by read name (required for paired-end counting with dexseq_count.py)
# conda install -c bioconda samtools
samtools sort -n "${BAM_FILE}" -o "${BAM_FILE%.bam}.nsorted.bam"

# Step 3: Count reads per exon using dexseq_count.py
# -p yes: Input reads are paired-end (use 'no' for single-end)
# -s no: Strandedness (yes: forward, reverse: reverse, no: unstranded). Adjust based on library prep.
#        'no' is a safe default if not specified.
# -f bam: Input file format is BAM
# Ensure 'dexseq_count.py' is in your PATH.
dexseq_count.py -p yes -s no -f bam "${DEXSEQ_GFF}" "${BAM_FILE%.bam}.nsorted.bam" "${OUTPUT_FILE}"

View on GitHub

Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform.

skipper (Inferred with models/gemini-2.5-flash) v0.1.0 GitHub

$ Bash example

# Install skipper (if not already installed)
# pip install skipper

# Example usage of skipper for alternative splicing quantification.
# This tool quantifies alternative splicing events by counting exon and exon-junction fragments
# uniquely supporting inclusion or skipping isoforms and assigns a probability score.

# Placeholder for input BAM file (e.g., aligned reads from STAR or HISAT2)
INPUT_BAM="aligned_reads.bam"

# Placeholder for GTF annotation file (e.g., Gencode human release 44)
# Download from: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
GTF_FILE="gencode.v44.annotation.gtf"

# Output file for splicing quantification results
OUTPUT_FILE="splicing_quantification.tsv"

# Run skipper to quantify alternative splicing events
skipper --bam_file "${INPUT_BAM}" --gtf_file "${GTF_FILE}" --output_file "${OUTPUT_FILE}"

View on GitHub

A Fisherâs exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR).

rMATS (Inferred with models/gemini-2.5-flash) v4.1.1 GitHub

$ Bash example

# Install rMATS-turbo (e.g., via conda)
# conda create -n rmats_env python=3.8
# conda activate rmats_env
# pip install rmats-turbo

# Define input BAM files (replace with actual paths)
# Assuming two conditions: 'control' and 'treatment' with replicates
# Create a file listing BAMs for condition 1 (e.g., control replicates)
echo "/path/to/control_rep1.bam" > control_bams.txt
echo "/path/to/control_rep2.bam" >> control_bams.txt

# Create a file listing BAMs for condition 2 (e.g., treatment replicates)
echo "/path/to/treatment_rep1.bam" > treatment_bams.txt
echo "/path/to/treatment_rep2.bam" >> treatment_bams.txt

# Define reference genome and annotation (replace with actual paths/versions)
# Using hg38 as a placeholder for human genome assembly
GENOME_GTF="/path/to/Homo_sapiens.GRCh38.109.gtf" # Example GTF for hg38, download from Ensembl or Gencode

# Define output and temporary directories
OUTPUT_DIR="rmats_output"
TMP_DIR="rmats_tmp"
mkdir -p "$OUTPUT_DIR" "$TMP_DIR"

# Run rMATS-turbo for alternative splicing analysis.
# rMATS quantifies splicing events using both exon and exon-junction fragments
# and calculates statistical significance (p-values) and False Discovery Rate (FDR)
# using Benjamini-Hochberg correction, which aligns with the description.
# Note: While the description mentions "Fisher’s exact test", rMATS uses a more complex
# statistical model (likelihood ratio test based on beta-binomial distribution) to evaluate
# differential splicing, but it provides the p-values and FDRs as described.
rmats.py \
    --b1 control_bams.txt \
    --b2 treatment_bams.txt \
    --gtf "$GENOME_GTF" \
    --od "$OUTPUT_DIR" \
    --tmp "$TMP_DIR" \
    -t paired \
    --readLength 100 \
    --nthread 8 \
    --libType fr-firststrand \
    --task as

View on GitHub

In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI).

rMATS (Inferred with models/gemini-2.5-flash) v4.1.2 GitHub

$ Bash example

# Install rMATS (example using conda)
# conda create -n rmats_env python=3.8
# conda activate rmats_env
# conda install -c bioconda rmats-turbo

# Example usage of rMATS for calculating differential exon inclusion (dI/dPSI)
# This assumes you have aligned BAM files for two conditions (e.g., control and treatment)
# and a genome annotation GTF file.

# Define input BAM files for two conditions
# Replace with actual paths to your BAM files
BAM_FILES_CONDITION1="path/to/control_rep1.bam,path/to/control_rep2.bam"
BAM_FILES_CONDITION2="path/to/treatment_rep1.bam,path/to/treatment_rep2.bam"

# Define genome annotation GTF file
# Using a placeholder for human hg38. Replace with your specific GTF path.
GENOME_GTF="path/to/Homo_sapiens.GRCh38.109.gtf" # Example: Ensembl GTF

# Define output directory
OUTPUT_DIR="rmats_output_dI_calculation"
mkdir -p "${OUTPUT_DIR}"

# Define temporary directory
TMP_DIR="rmats_tmp"
mkdir -p "${TMP_DIR}"

# Define read length (e.g., 50bp)
READ_LENGTH=50

# Define number of threads
NUM_THREADS=8

# Define library type (e.g., fr-firststrand for dUTP/directional RNA-seq)
# Common options: fr-unstranded, fr-firststrand, fr-secondstrand
LIBRARY_TYPE="fr-firststrand"

# Run rMATS to calculate differential splicing events, including exon inclusion (SE)
# The output will include 'SE.MATS.JC.txt' and 'SE.MATS.JunctionCountOnly.txt'
# which contain PSI values and dPSI (dI) for skipped exons.
rmats.py \
    --b1 "${BAM_FILES_CONDITION1}" \
    --b2 "${BAM_FILES_CONDITION2}" \
    --gtf "${GENOME_GTF}" \
    --od "${OUTPUT_DIR}" \
    --tmp "${TMP_DIR}" \
    -t paired \
    --readLength "${READ_LENGTH}" \
    --nthread "${NUM_THREADS}" \
    --libType "${LIBRARY_TYPE}"

View on GitHub

See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation.

Quantas vv1.0

$ Bash example

# Quantas is a tool for quantifying alternative splicing from RNA-seq data.
# Installation instructions (assuming a Linux environment):
# Download Quantas v1.0
# wget http://zhanglab.c2b2.columbia.edu/downloads/quantas_v1.0.tar.gz
# tar -xzf quantas_v1.0.tar.gz
# cd quantas_v1.0
# make
# export PATH=$(pwd):$PATH # Add Quantas to your PATH

# Example usage:
# Replace 'Homo_sapiens.GRCh38.109.gtf' with your actual GTF annotation file.
# Replace 'sample.bam' with your actual RNA-seq alignment BAM file.
# Ensure the BAM file is sorted and indexed.

# Create an output directory
mkdir -p quantas_output

# Run Quantas
quantas -a Homo_sapiens.GRCh38.109.gtf -r sample.bam -o quantas_output

Tools Used

RNA-seq

Raw Source Text

RNA-seq data was aligned to the human hg19 genome build using Olego alignes, and alternative splicing was estimated as described below.
Quantas software as described in Charizanis et al, 2012, Neuron, was used to estimate alternative splicing. Olego aligned alignment files were used to count observed junction reads for each exon. Weighted number of exon or exon-junction fragments uniquely supporting the inclusion or skipping isoform of each cassette exon and a probability score was assigned to each isoform. A Fisherâs exact test was used to evaluate the statistical significance of splicing changes using both exon and exon-junction fragments, followed by Benjamini multiple testing correction to estimate the false discovery rate (FDR). In addition, inclusion or exclusion junction reads were used to calculate the proportional change of exon inclusion (dI). See documentation at http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation.
Genome_build: hg19
Supplementary_files_format_and_content: RPKM

← Back to Analysis