GSE176045 Processing Pipeline
Publication
Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.Nature communications (2021) — PMID 34732726
Dataset
GSE176045RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (WTB RNA-Seq)
Processing Steps
Generate Jupyter Notebook-
1
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star=2.4.2a # Create a directory for reference files mkdir -p reference_data cd reference_data # Download human hg19 reference genome (UCSC hg19) wget -O hg19.fa.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz gunzip hg19.fa.gz # Download Ensembl 72 GTF annotation for GRCh37 (hg19) wget -O Homo_sapiens.GRCh37.72.gtf.gz ftp://ftp.ensembl.org/pub/release-72/gtf/homo_sapiens/Homo_sapiens.GRCh37.72.gtf.gz gunzip Homo_sapiens.GRCh37.72.gtf.gz cd .. # Create a directory for the STAR genome index mkdir -p STAR_index_hg19_Ensembl72 # Build the STAR genome index using hg19 and Ensembl 72 GTF STAR --runMode genomeGenerate \ --genomeDir STAR_index_hg19_Ensembl72 \ --genomeFastaFiles reference_data/hg19.fa \ --sjdbGTFfile reference_data/Homo_sapiens.GRCh37.72.gtf \ --sjdbOverhang 100 \ --runThreadN 8 # Adjust thread count as needed # Assuming input RNA-Seq FASTQ files are named read1.fastq.gz and read2.fastq.gz # Replace with your actual input file names # Perform RNA-Seq alignment using STAR STAR --runMode alignReads \ --genomeDir STAR_index_hg19_Ensembl72 \ --readFilesIn read1.fastq.gz read2.fastq.gz \ --readFilesCommand zcat \ --outFileNamePrefix aligned_reads_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --quantMode GeneCounts \ --twopassMode Basic \ --runThreadN 8 # Adjust thread count as needed -
2
To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.
RNA-seq v1.0.0 (Inferred with models/gemini-2.5-flash)$ Bash example
# Install MultiPath-PSI (example, check official documentation for latest instructions) # git clone https://github.com/zhanglab-ucsf/MultiPath-PSI.git # cd MultiPath-PSI # pip install . # Define input files and reference data placeholders # Replace with actual paths to your iPSC-CM RNA-Seq BAM files, GTF annotation, and genome FASTA. INPUT_BAM="path/to/your/ipsc_cm_rnaseq_sample.bam" GTF_FILE="path/to/your/gencode.v38.annotation.gtf" # Example: Gencode v38 for human hg38 GENOME_FASTA="path/to/your/GRCh38.primary_assembly.genome.fa" # Example: GRCh38 primary assembly OUTPUT_DIR="multipath_psi_output" # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run MultiPath-PSI to accurately detect diverse alternative splicing events # This command is based on the typical usage pattern from the MultiPath-PSI GitHub repository. # Adjust parameters like --threads, --min_reads, etc., as needed for your specific analysis. multipath_psi run \ --bam_file "${INPUT_BAM}" \ --gtf_file "${GTF_FILE}" \ --genome_fasta "${GENOME_FASTA}" \ --output_dir "${OUTPUT_DIR}" \ --threads 8 # Example: Use 8 CPU threads for processing -
3
MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
$ Bash example
# Installation: AltAnalyze can be installed via pip or downloaded from its website/GitHub. # To install a specific version (e.g., 2.1.1), it might be necessary to download the source # or use a specific release tag if available via pip. # pip install AltAnalyze==2.1.1 # (If this specific version is available via pip) # Alternatively, clone the repository and run from source: # git clone https://github.com/altanalyze/altanalyze.git # cd altanalyze # python setup.py install # Or just run AltAnalyze.py directly # Define input and output directories INPUT_BAM_DIR="/path/to/aligned_bam_files" # Directory containing aligned BAM files OUTPUT_DIR="/path/to/multipath_psi_results" # Directory for AltAnalyze output # AltAnalyze typically requires a grouping file for differential splicing analysis. # Create a placeholder grouping file (e.g., RNASeq_groups.txt) if not provided. # Format: SampleID (without .bam extension) Group Batch (tab-separated) # Example: # echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1\nsample3\tTreated\tBatch1\nsample4\tTreated\tBatch1" > RNASeq_groups.txt GROUPING_FILE="RNASeq_groups.txt" # Create a dummy grouping file for demonstration if it doesn't exist if [ ! -f "${GROUPING_FILE}" ]; then echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1" > "${GROUPING_FILE}" echo "# NOTE: A proper grouping file with all sample IDs from your BAMs is required for meaningful analysis." >> "${GROUPING_FILE}" fi # Execute AltAnalyze with MultiPath-PSI algorithm # --species: Reference species (e.g., Hs for Homo sapiens, Mm for Mus musculus). # --platform RNASeq: Specifies RNA-Seq data as input. # --inputdir: Path to the directory containing input BAM files. # --outputdir: Path to the directory where results will be saved. # --runMultiPathPSI yes: Activates the MultiPath-PSI algorithm. # --grouping: Path to the grouping file for differential analysis. python AltAnalyze.py \ --species Hs \ --platform RNASeq \ --inputdir "${INPUT_BAM_DIR}" \ --outputdir "${OUTPUT_DIR}" \ --runMultiPathPSI yes \ --grouping "${GROUPING_FILE}" -
4
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
$ Bash example
# Install kallisto (example using conda) # conda create -n kallisto_env kallisto=0.46.2 -c bioconda -c conda-forge # conda activate kallisto_env # Placeholder for reference transcriptome index # This index should be built once from a FASTA file of transcripts (e.g., from GENCODE or Ensembl). # Example command to build index (assuming human_gencode_vXX_transcripts.fasta.gz is available): # kallisto index -i human_gencode_vXX_transcriptome.idx human_gencode_vXX_transcripts.fasta.gz # Define variables TRANSCRIPTOME_INDEX="human_gencode_vXX_transcriptome.idx" # Placeholder for latest human GENCODE transcriptome index READS_R1="sample_R1.fastq.gz" # Placeholder for input forward reads READS_R2="sample_R2.fastq.gz" # Placeholder for input reverse reads OUTPUT_DIR="kallisto_quant_output" THREADS=8 # Number of threads to use # Create output directory if it doesn't exist mkdir -p "${OUTPUT_DIR}" # Run kallisto quantification for paired-end reads kallisto quant \ -i "${TRANSCRIPTOME_INDEX}" \ -o "${OUTPUT_DIR}" \ -t "${THREADS}" \ --bias \ "${READS_R1}" \ "${READS_R2}"
Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm). For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used. Genome_build: hg19 Supplementary_files_format_and_content: PSI splicing events predicted from MultiPath-PSI