GSE175886 Processing Pipeline

GSE code_examples 4 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE175886

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.

Processing Steps

Generate Jupyter Notebook

RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.

STAR v2.4.2a GitHub

$ Bash example

# Install STAR (example using conda)
# conda install -c bioconda star=2.4.2a

# Build STAR genome index (if not already built)
# This step would typically be done once for a given genome/annotation combination.
# The description implies the index for hg19 and Ensembl 72 is already available.
# Example command to build the index (replace paths and number of threads):
# STAR --runMode genomeGenerate \
#      --genomeDir /path/to/STAR_hg19_Ensembl72_index \
#      --genomeFastaFiles /path/to/hg19.fa \
#      --sjdbGTFfile /path/to/Ensembl72.gtf \
#      --sjdbOverhang 100 \
#      --runThreadN 8

# Align RNA-Seq data using STAR
# Replace /path/to/STAR_hg19_Ensembl72_index with the actual path to your genome index.
# Replace read1.fastq.gz and read2.fastq.gz with your input FASTQ files.
# Adjust --runThreadN based on available CPU cores.
STAR --genomeDir /path/to/STAR_hg19_Ensembl72_index \
     --readFilesIn read1.fastq.gz read2.fastq.gz \
     --runThreadN 8 \
     --outFileNamePrefix aligned_reads_ \
     --outSAMtype BAM SortedByCoordinate \
     --outSAMattributes Standard \
     --readFilesCommand zcat \
     --outFilterType BySJout \
     --outFilterMultimapNmax 20 \
     --outFilterMismatchNmax 999 \
     --outFilterMismatchNoverLmax 0.04 \
     --alignIntronMin 20 \
     --alignIntronMax 1000000 \
     --alignMatesGapMax 1000000 \
     --limitBAMsortRAM 30000000000 # Example: 30GB RAM for sorting

View on GitHub

To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.

RNA-seq v1.0.0

$ Bash example

# Install MultiPath-PSI
# pip install multipath-psi

# Define reference genome and annotation (using latest human assembly as placeholder)
GENOME_FA="GRCh38.primary_assembly.genome.fa" # Placeholder for human reference genome
GTF_FILE="gencode.v44.annotation.gtf"       # Placeholder for GENCODE annotation
INDEX_DIR="multipath_psi_index"
OUTPUT_DIR="multipath_psi_output"

# Download reference files (example, replace with actual download commands if needed)
# wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz
# gunzip GRCh38.primary_assembly.genome.fa.gz
# wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
# gunzip gencode.v44.annotation.gtf.gz

# Create index for MultiPath-PSI
# This step requires a genome FASTA and a GTF/GFF annotation file.
mkdir -p "${INDEX_DIR}"
multipath-psi build -g "${GENOME_FA}" -a "${GTF_FILE}" -o "${INDEX_DIR}"

# Example RNA-Seq BAM files (replace with actual input files from iPSC-CM RNA-Seq)
INPUT_BAM_1="ipsc_cm_sample1.bam"
INPUT_BAM_2="ipsc_cm_sample2.bam"

# Create output directory
mkdir -p "${OUTPUT_DIR}"

# Run MultiPath-PSI quantification
# This command quantifies alternative splicing events (PSI values) from RNA-Seq BAM files.
multipath-psi quant -i "${INDEX_DIR}" -o "${OUTPUT_DIR}" "${INPUT_BAM_1}" "${INPUT_BAM_2}"

MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).

AltAnalyze v2.1.1 GitHub

$ Bash example

# Install AltAnalyze (if not already installed)
# pip install AltAnalyze

# Assuming AltAnalyze.py is in your PATH or current directory
# Create a directory for input BAM files and copy your aligned BAMs into it
# mkdir -p /path/to/input_bams
# cp your_aligned_file1.bam /path/to/input_bams/
# cp your_aligned_file2.bam /path/to/input_bams/

# Run AltAnalyze for splicing analysis, which includes MultiPath-PSI.
# --species: Specify the species (e.g., Hs for Homo sapiens, Mm for Mus musculus).
# --array_type: Specify the assay type (e.g., RNASeq).
# --input_files: Path to a directory containing aligned BAM files.
# --output_dir: Directory to store AltAnalyze results.
python AltAnalyze.py \
    --run_splicing \
    --species Hs \
    --array_type RNASeq \
    --input_files /path/to/input_bams \
    --output_dir altanalyze_multipath_psi_output

View on GitHub

For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.

kallisto vNot specified GitHub

$ Bash example

# Install kallisto (example using conda)
# conda install -c bioconda kallisto

# Placeholder for kallisto index creation
# Replace 'transcripts.fasta' with your actual transcriptome FASTA file (e.g., from Ensembl, GENCODE, RefSeq)
# Replace 'human_transcriptome.idx' with your desired index name
# kallisto index -i human_transcriptome.idx transcripts.fasta

# Example kallisto quantification command for paired-end reads
# Replace 'human_transcriptome.idx' with your actual kallisto index
# Replace 'read1.fastq.gz' and 'read2.fastq.gz' with your actual input FASTQ files
# Replace 'output_dir' with your desired output directory
kallisto quant -i human_transcriptome.idx -o output_dir read1.fastq.gz read2.fastq.gz

# For single-end reads, use the --single flag and specify fragment length and standard deviation:
# kallisto quant -i human_transcriptome.idx -o output_dir --single -l 200 -s 20 read1.fastq.gz

View on GitHub

Tools Used

STAR RNA-seq

Raw Source Text

RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
Genome_build: hg19
Supplementary_files_format_and_content: Kallisto-gene expression Quantification data

← Back to Analysis