GSE113947 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE113947

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing [RNA-Seq]

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star
    
    # Define variables for input and output
    # Replace with actual paths and filenames
    # The GENOME_DIR should contain a STAR index built from the human hg19 genome and Ensembl 72 transcriptome annotation.
    GENOME_DIR="/path/to/STAR_index/human_hg19_ensembl72"
    READ1_FASTQ="input_R1.fastq.gz"
    READ2_FASTQ="input_R2.fastq.gz" # Remove this line if data is single-end
    OUTPUT_PREFIX="sample_aligned_"
    NUM_THREADS=8 # Adjust based on available CPU cores
    
    # Align RNA-Seq data to the human hg19 reference genome and transcriptome (Ensembl 72)
    STAR \
      --genomeDir "${GENOME_DIR}" \
      --readFilesIn "${READ1_FASTQ}" "${READ2_FASTQ}" \
      --readFilesCommand zcat \
      --outFileNamePrefix "${OUTPUT_PREFIX}" \
      --outSAMtype BAM SortedByCoordinate \
      --outSAMattributes Standard \
      --runThreadN "${NUM_THREADS}"
  2. 2

    To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.

    RNA-seq vv1.0.0
    $ Bash example
    # Install MultiPath-PSI (recommended in a dedicated conda environment)
    # conda create -n multipath_psi python=3.8
    # conda activate multipath_psi
    # conda install -c bioconda multipath-psi
    
    # --- Placeholder for upstream data and configuration ---
    # MultiPath-PSI integrates results from other splicing quantification tools (e.g., rMATS, LeafCutter).
    # The RNA-Seq data (iPSC-CM RNA-Seq) would first be processed by such tools.
    # For example, if rMATS was used, its output files would be the input for MultiPath-PSI.
    
    # Placeholder for reference genome and annotation (used by upstream tools like rMATS or LeafCutter)
    # GENOME_FASTA="/path/to/human/hg38.fa"
    # GENOME_GTF="/path/to/human/gencode.v38.annotation.gtf"
    
    # Placeholder for input files from an upstream splicing quantification tool (e.g., rMATS)
    # Replace with actual paths to your rMATS (or other tool) output files for each sample.
    # For demonstration, we assume rMATS output for 'Skipped Exon' (SE) and 'Mutually Exclusive Exons' (MXE) events.
    # RMATS_OUTPUT_DIR="/path/to/your/rmats_output_directory"
    # SAMPLE1_RMATS_SE_JC="${RMATS_OUTPUT_DIR}/sample1_SE.MATS.JC.txt"
    # SAMPLE1_RMATS_MXE_JC="${RMATS_OUTPUT_DIR}/sample1_MXE.MATS.JC.txt"
    # SAMPLE2_RMATS_SE_JC="${RMATS_OUTPUT_DIR}/sample2_SE.MATS.JC.txt"
    # SAMPLE2_RMATS_MXE_JC="${RMATS_OUTPUT_DIR}/sample2_MXE.MATS.JC.txt"
    # ... add more samples as needed
    
    # Create a MultiPath-PSI configuration file (e.g., multipath_psi_config.yaml)
    # This file specifies the input files from other tools, sample groups, and event types.
    # Adjust paths and sample information according to your experimental design.
    cat << EOF > multipath_psi_config.yaml
    project_name: iPSC_CM_AlternativeSplicing
    output_dir: ./multipath_psi_results
    samples:
      - name: iPSC_CM_Sample1
        group: control
        rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample1_SE.MATS.JC.txt
        rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample1_MXE.MATS.JC.txt
        # Add other event types (A3SS, A5SS, RI) and tools (e.g., leafcutter_output) as needed
      - name: iPSC_CM_Sample2
        group: control
        rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample2_SE.MATS.JC.txt
        rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample2_MXE.MATS.JC.txt
      - name: iPSC_CM_Sample3
        group: treated
        rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample3_SE.MATS.JC.txt
        rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample3_MXE.MATS.JC.txt
      - name: iPSC_CM_Sample4
        group: treated
        rmats_se_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample4_SE.MATS.JC.txt
        rmats_mxe_jc: /path/to/your/rmats_output_directory/iPSC_CM_Sample4_MXE.MATS.JC.txt
    tools:
      - rmats # Specify the upstream tools whose outputs are being integrated
    event_types:
      - SE
      - MXE # Specify the alternative splicing event types to analyze
      # - A3SS
      # - A5SS
      # - RI
    EOF
    
    # Run MultiPath-PSI to integrate and quantify PSI values
    multipath-psi run --config multipath_psi_config.yaml
  3. 3

    MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).

    AltAnalyze v2.1.1 GitHub
    $ Bash example
    # Install AltAnalyze (if not already installed)
    # It is recommended to install AltAnalyze in a dedicated Python environment.
    # For example, using pip:
    # pip install AltAnalyze==2.1.1
    
    # Example command for running MultiPath-PSI using AltAnalyze version 2.1.1
    # Input: A directory containing aligned BAM files (e.g., /path/to/aligned_bams).
    # Output: A directory where MultiPath-PSI results will be stored (e.g., /path/to/output_multipathpsi).
    # Species: Homo sapiens (Hs) is used as a common default; replace if your data is from a different species.
    # The --runMultiPathPSI flag activates the MultiPath-PSI algorithm.
    # Ensure 'AltAnalyze.py' is accessible in your PATH or specify its full path.
    python AltAnalyze.py --run RNASeq --species Hs --platform RNASeq --input /path/to/aligned_bams --output /path/to/output_multipathpsi --runMultiPathPSI
  4. 4

    For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.

    kallisto v0.48.0 GitHub
    $ Bash example
    # Install kallisto (example using conda)
    # conda install -c bioconda kallisto
    
    # Placeholder for Kallisto index (e.g., built from human GRCh38 transcriptome, GENCODE annotation)
    # First, download a transcriptome FASTA file (e.g., from GENCODE or Ensembl)
    # Example: wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.transcripts.fa.gz
    # Then build the index:
    # kallisto index -i human_GRCh38_gencode_v45.idx gencode.v45.transcripts.fa.gz
    
    # Run kallisto quantification for paired-end reads
    # Replace 'read1.fastq.gz' and 'read2.fastq.gz' with your actual input FASTQ files
    # Replace 'human_GRCh38_gencode_v45.idx' with your actual Kallisto index file
    # Replace 'kallisto_quant_output' with your desired output directory name
    kallisto quant -i human_GRCh38_gencode_v45.idx -o kallisto_quant_output read1.fastq.gz read2.fastq.gz

Tools Used

Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
Genome_build: hg19
Supplementary_files_format_and_content: Kallisto-gene expression Quantification data
← Back to Analysis