GSE113947 Processing Pipeline
Publication
Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.Nature communications (2021) — PMID 34732726
Dataset
GSE113947RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing [RNA-Seq]
Processing Steps
Generate Jupyter Notebook-
1
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.
$ Bash example
# Install STAR (if not already installed) # conda install -c bioconda star # Define variables for input and output # Replace with actual paths and filenames # The GENOME_DIR should contain a STAR index built from the human hg19 genome and Ensembl 72 transcriptome annotation. GENOME_DIR="/path/to/STAR_index/human_hg19_ensembl72" READ1_FASTQ="input_R1.fastq.gz" READ2_FASTQ="input_R2.fastq.gz" # Remove this line if data is single-end OUTPUT_PREFIX="sample_aligned_" NUM_THREADS=8 # Adjust based on available CPU cores # Align RNA-Seq data to the human hg19 reference genome and transcriptome (Ensembl 72) STAR \ --genomeDir "${GENOME_DIR}" \ --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \ --readFilesCommand zcat \ --outFileNamePrefix "${OUTPUT_PREFIX}" \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --runThreadN "${NUM_THREADS}" -
2
To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.
RNA-seq vv1.0.0$ Bash example
# Install MultiPath-PSI (recommended in a dedicated conda environment) # conda create -n multipath_psi python=3.8 # conda activate multipath_psi # conda install -c bioconda multipath-psi # --- Placeholder for upstream data and configuration --- # MultiPath-PSI integrates results from other splicing quantification tools (e.g., rMATS, LeafCutter). # The RNA-Seq data (iPSC-CM RNA-Seq) would first be processed by such tools. # For example, if rMATS was used, its output files would be the input for MultiPath-PSI. # Placeholder for reference genome and annotation (used by upstream tools like rMATS or LeafCutter) # GENOME_FASTA="/path/to/human/hg38.fa" # GENOME_GTF="/path/to/human/gencode.v38.annotation.gtf" # Placeholder for input files from an upstream splicing quantification tool (e.g., rMATS) # Replace with actual paths to your rMATS (or other tool) output files for each sample. # For demonstration, we assume rMATS output for 'Skipped Exon' (SE) and 'Mutually Exclusive Exons' (MXE) events. # RMATS_OUTPUT_DIR="/path/to/your/rmats_output_directory" # SAMPLE1_RMATS_SE_JC="${RMATS_OUTPUT_DIR}/sample1_SE.MATS.JC.txt" # SAMPLE1_RMATS_MXE_JC="${RMATS_OUTPUT_DIR}/sample1_MXE.MATS.JC.txt" # SAMPLE2_RMATS_SE_JC="${RMATS_OUTPUT_DIR}/sample2_SE.MATS.JC.txt" # SAMPLE2_RMATS_MXE_JC="${RMATS_OUTPUT_DIR}/sample2_MXE.MATS.JC.txt" # ... add more samples as needed # Create a MultiPath-PSI configuration file (e.g., multipath_psi_config.yaml) # This file specifies the input files from other tools, sample groups, and event types. # Adjust paths and sample information according to your experimental design. cat << EOF > multipath_psi_config.yaml project_name: iPSC_CM_AlternativeSplicing output_dir: ./multipath_psi_results samples: - name: iPSC_CM_Sample1 group: control rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample1_SE.MATS.JC.txt rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample1_MXE.MATS.JC.txt # Add other event types (A3SS, A5SS, RI) and tools (e.g., leafcutter_output) as needed - name: iPSC_CM_Sample2 group: control rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample2_SE.MATS.JC.txt rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample2_MXE.MATS.JC.txt - name: iPSC_CM_Sample3 group: treated rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample3_SE.MATS.JC.txt rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample3_MXE.MATS.JC.txt - name: iPSC_CM_Sample4 group: treated rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample4_SE.MATS.JC.txt rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample4_MXE.MATS.JC.txt tools: - rmats # Specify the upstream tools whose outputs are being integrated event_types: - SE - MXE # Specify the alternative splicing event types to analyze # - A3SS # - A5SS # - RI EOF # Run MultiPath-PSI to integrate and quantify PSI values multipath-psi run --config multipath_psi_config.yaml -
3
MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
$ Bash example
# Install AltAnalyze (if not already installed) # It is recommended to install AltAnalyze in a dedicated Python environment. # For example, using pip: # pip install AltAnalyze==2.1.1 # Example command for running MultiPath-PSI using AltAnalyze version 2.1.1 # Input: A directory containing aligned BAM files (e.g., /path/to/aligned_bams). # Output: A directory where MultiPath-PSI results will be stored (e.g., /path/to/output_multipathpsi). # Species: Homo sapiens (Hs) is used as a common default; replace if your data is from a different species. # The --runMultiPathPSI flag activates the MultiPath-PSI algorithm. # Ensure 'AltAnalyze.py' is accessible in your PATH or specify its full path. python AltAnalyze.py --run RNASeq --species Hs --platform RNASeq --input /path/to/aligned_bams --output /path/to/output_multipathpsi --runMultiPathPSI
-
4
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
$ Bash example
# Install kallisto (example using conda) # conda install -c bioconda kallisto # Placeholder for Kallisto index (e.g., built from human GRCh38 transcriptome, GENCODE annotation) # First, download a transcriptome FASTA file (e.g., from GENCODE or Ensembl) # Example: wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.transcripts.fa.gz # Then build the index: # kallisto index -i human_GRCh38_gencode_v45.idx gencode.v45.transcripts.fa.gz # Run kallisto quantification for paired-end reads # Replace 'read1.fastq.gz' and 'read2.fastq.gz' with your actual input FASTQ files # Replace 'human_GRCh38_gencode_v45.idx' with your actual Kallisto index file # Replace 'kallisto_quant_output' with your desired output directory name kallisto quant -i human_GRCh38_gencode_v45.idx -o kallisto_quant_output read1.fastq.gz read2.fastq.gz
Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm). For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used. Genome_build: hg19 Supplementary_files_format_and_content: Kallisto-gene expression Quantification data