GSE176045 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE176045

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (WTB RNA-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.

STAR v2.4.2a GitHub

$ Bash example

# Install STAR (if not already installed)
# conda install -c bioconda star=2.4.2a

# Create a directory for reference files
mkdir -p reference_data
cd reference_data

# Download human hg19 reference genome (UCSC hg19)
wget -O hg19.fa.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
gunzip hg19.fa.gz

# Download Ensembl 72 GTF annotation for GRCh37 (hg19)
wget -O Homo_sapiens.GRCh37.72.gtf.gz ftp://ftp.ensembl.org/pub/release-72/gtf/homo_sapiens/Homo_sapiens.GRCh37.72.gtf.gz
gunzip Homo_sapiens.GRCh37.72.gtf.gz

cd ..

# Create a directory for the STAR genome index
mkdir -p STAR_index_hg19_Ensembl72

# Build the STAR genome index using hg19 and Ensembl 72 GTF
STAR --runMode genomeGenerate \
     --genomeDir STAR_index_hg19_Ensembl72 \
     --genomeFastaFiles reference_data/hg19.fa \
     --sjdbGTFfile reference_data/Homo_sapiens.GRCh37.72.gtf \
     --sjdbOverhang 100 \
     --runThreadN 8 # Adjust thread count as needed

# Assuming input RNA-Seq FASTQ files are named read1.fastq.gz and read2.fastq.gz
# Replace with your actual input file names

# Perform RNA-Seq alignment using STAR
STAR --runMode alignReads \
     --genomeDir STAR_index_hg19_Ensembl72 \
     --readFilesIn read1.fastq.gz read2.fastq.gz \
     --readFilesCommand zcat \
     --outFileNamePrefix aligned_reads_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard \
     --quantMode GeneCounts \
     --twopassMode Basic \
     --runThreadN 8 # Adjust thread count as needed

View on GitHub

To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.

RNA-seq v1.0.0 (Inferred with models/gemini-2.5-flash)

$ Bash example

# Install MultiPath-PSI (example, check official documentation for latest instructions)
# git clone https://github.com/zhanglab-ucsf/MultiPath-PSI.git
# cd MultiPath-PSI
# pip install .

# Define input files and reference data placeholders
# Replace with actual paths to your iPSC-CM RNA-Seq BAM files, GTF annotation, and genome FASTA.
INPUT_BAM="path/to/your/ipsc_cm_rnaseq_sample.bam"
GTF_FILE="path/to/your/gencode.v38.annotation.gtf" # Example: Gencode v38 for human hg38
GENOME_FASTA="path/to/your/GRCh38.primary_assembly.genome.fa" # Example: GRCh38 primary assembly
OUTPUT_DIR="multipath_psi_output"

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run MultiPath-PSI to accurately detect diverse alternative splicing events
# This command is based on the typical usage pattern from the MultiPath-PSI GitHub repository.
# Adjust parameters like --threads, --min_reads, etc., as needed for your specific analysis.
multipath_psi run \
    --bam_file "${INPUT_BAM}" \
    --gtf_file "${GTF_FILE}" \
    --genome_fasta "${GENOME_FASTA}" \
    --output_dir "${OUTPUT_DIR}" \
    --threads 8 # Example: Use 8 CPU threads for processing

MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).

AltAnalyze v2.1.1 GitHub

$ Bash example

# Installation: AltAnalyze can be installed via pip or downloaded from its website/GitHub.
# To install a specific version (e.g., 2.1.1), it might be necessary to download the source
# or use a specific release tag if available via pip.
# pip install AltAnalyze==2.1.1 # (If this specific version is available via pip)
# Alternatively, clone the repository and run from source:
# git clone https://github.com/altanalyze/altanalyze.git
# cd altanalyze
# python setup.py install # Or just run AltAnalyze.py directly

# Define input and output directories
INPUT_BAM_DIR="/path/to/aligned_bam_files" # Directory containing aligned BAM files
OUTPUT_DIR="/path/to/multipath_psi_results" # Directory for AltAnalyze output

# AltAnalyze typically requires a grouping file for differential splicing analysis.
# Create a placeholder grouping file (e.g., RNASeq_groups.txt) if not provided.
# Format: SampleID (without .bam extension)	Group	Batch (tab-separated)
# Example:
# echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1\nsample3\tTreated\tBatch1\nsample4\tTreated\tBatch1" > RNASeq_groups.txt
GROUPING_FILE="RNASeq_groups.txt"
# Create a dummy grouping file for demonstration if it doesn't exist
if [ ! -f "${GROUPING_FILE}" ]; then
    echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1" > "${GROUPING_FILE}"
    echo "# NOTE: A proper grouping file with all sample IDs from your BAMs is required for meaningful analysis." >> "${GROUPING_FILE}"
fi

# Execute AltAnalyze with MultiPath-PSI algorithm
# --species: Reference species (e.g., Hs for Homo sapiens, Mm for Mus musculus).
# --platform RNASeq: Specifies RNA-Seq data as input.
# --inputdir: Path to the directory containing input BAM files.
# --outputdir: Path to the directory where results will be saved.
# --runMultiPathPSI yes: Activates the MultiPath-PSI algorithm.
# --grouping: Path to the grouping file for differential analysis.
python AltAnalyze.py \
    --species Hs \
    --platform RNASeq \
    --inputdir "${INPUT_BAM_DIR}" \
    --outputdir "${OUTPUT_DIR}" \
    --runMultiPathPSI yes \
    --grouping "${GROUPING_FILE}"

View on GitHub

For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.

kallisto v0.46.2 GitHub

$ Bash example

# Install kallisto (example using conda)
# conda create -n kallisto_env kallisto=0.46.2 -c bioconda -c conda-forge
# conda activate kallisto_env

# Placeholder for reference transcriptome index
# This index should be built once from a FASTA file of transcripts (e.g., from GENCODE or Ensembl).
# Example command to build index (assuming human_gencode_vXX_transcripts.fasta.gz is available):
# kallisto index -i human_gencode_vXX_transcriptome.idx human_gencode_vXX_transcripts.fasta.gz

# Define variables
TRANSCRIPTOME_INDEX="human_gencode_vXX_transcriptome.idx" # Placeholder for latest human GENCODE transcriptome index
READS_R1="sample_R1.fastq.gz" # Placeholder for input forward reads
READS_R2="sample_R2.fastq.gz" # Placeholder for input reverse reads
OUTPUT_DIR="kallisto_quant_output"
THREADS=8 # Number of threads to use

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Run kallisto quantification for paired-end reads
kallisto quant \
  -i "${TRANSCRIPTOME_INDEX}" \
  -o "${OUTPUT_DIR}" \
  -t "${THREADS}" \
  --bias \
  "${READS_R1}" \
  "${READS_R2}"

View on GitHub

Tools Used

STAR RNA-seq

Raw Source Text

RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm). For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
Genome_build: hg19
Supplementary_files_format_and_content: PSI splicing events predicted from MultiPath-PSI

← Back to Analysis