GSE176045 Processing Pipeline

RNA-Seq code_examples 4 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE176045

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing (WTB RNA-Seq)

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.

    $ Bash example
    # Install STAR (if not already installed)
    # conda install -c bioconda star=2.4.2a
    
    # Create a directory for reference files
    mkdir -p reference_data
    cd reference_data
    
    # Download human hg19 reference genome (UCSC hg19)
    wget -O hg19.fa.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
    gunzip hg19.fa.gz
    
    # Download Ensembl 72 GTF annotation for GRCh37 (hg19)
    wget -O Homo_sapiens.GRCh37.72.gtf.gz ftp://ftp.ensembl.org/pub/release-72/gtf/homo_sapiens/Homo_sapiens.GRCh37.72.gtf.gz
    gunzip Homo_sapiens.GRCh37.72.gtf.gz
    
    cd ..
    
    # Create a directory for the STAR genome index
    mkdir -p STAR_index_hg19_Ensembl72
    
    # Build the STAR genome index using hg19 and Ensembl 72 GTF
    STAR --runMode genomeGenerate \
         --genomeDir STAR_index_hg19_Ensembl72 \
         --genomeFastaFiles reference_data/hg19.fa \
         --sjdbGTFfile reference_data/Homo_sapiens.GRCh37.72.gtf \
         --sjdbOverhang 100 \
         --runThreadN 8 # Adjust thread count as needed
    
    # Assuming input RNA-Seq FASTQ files are named read1.fastq.gz and read2.fastq.gz
    # Replace with your actual input file names
    
    # Perform RNA-Seq alignment using STAR
    STAR --runMode alignReads \
         --genomeDir STAR_index_hg19_Ensembl72 \
         --readFilesIn read1.fastq.gz read2.fastq.gz \
         --readFilesCommand zcat \
         --outFileNamePrefix aligned_reads_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --quantMode GeneCounts \
         --twopassMode Basic \
         --runThreadN 8 # Adjust thread count as needed
    
  2. 2

    To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.

    RNA-seq v1.0.0 (Inferred with models/gemini-2.5-flash)
    $ Bash example
    # Install MultiPath-PSI (example, check official documentation for latest instructions)
    # git clone https://github.com/zhanglab-ucsf/MultiPath-PSI.git
    # cd MultiPath-PSI
    # pip install .
    
    # Define input files and reference data placeholders
    # Replace with actual paths to your iPSC-CM RNA-Seq BAM files, GTF annotation, and genome FASTA.
    INPUT_BAM="path/to/your/ipsc_cm_rnaseq_sample.bam"
    GTF_FILE="path/to/your/gencode.v38.annotation.gtf" # Example: Gencode v38 for human hg38
    GENOME_FASTA="path/to/your/GRCh38.primary_assembly.genome.fa" # Example: GRCh38 primary assembly
    OUTPUT_DIR="multipath_psi_output"
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run MultiPath-PSI to accurately detect diverse alternative splicing events
    # This command is based on the typical usage pattern from the MultiPath-PSI GitHub repository.
    # Adjust parameters like --threads, --min_reads, etc., as needed for your specific analysis.
    multipath_psi run \
        --bam_file "${INPUT_BAM}" \
        --gtf_file "${GTF_FILE}" \
        --genome_fasta "${GENOME_FASTA}" \
        --output_dir "${OUTPUT_DIR}" \
        --threads 8 # Example: Use 8 CPU threads for processing
  3. 3

    MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).

    AltAnalyze v2.1.1 GitHub
    $ Bash example
    # Installation: AltAnalyze can be installed via pip or downloaded from its website/GitHub.
    # To install a specific version (e.g., 2.1.1), it might be necessary to download the source
    # or use a specific release tag if available via pip.
    # pip install AltAnalyze==2.1.1 # (If this specific version is available via pip)
    # Alternatively, clone the repository and run from source:
    # git clone https://github.com/altanalyze/altanalyze.git
    # cd altanalyze
    # python setup.py install # Or just run AltAnalyze.py directly
    
    # Define input and output directories
    INPUT_BAM_DIR="/path/to/aligned_bam_files" # Directory containing aligned BAM files
    OUTPUT_DIR="/path/to/multipath_psi_results" # Directory for AltAnalyze output
    
    # AltAnalyze typically requires a grouping file for differential splicing analysis.
    # Create a placeholder grouping file (e.g., RNASeq_groups.txt) if not provided.
    # Format: SampleID (without .bam extension)	Group	Batch (tab-separated)
    # Example:
    # echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1\nsample3\tTreated\tBatch1\nsample4\tTreated\tBatch1" > RNASeq_groups.txt
    GROUPING_FILE="RNASeq_groups.txt"
    # Create a dummy grouping file for demonstration if it doesn't exist
    if [ ! -f "${GROUPING_FILE}" ]; then
        echo -e "sample1\tControl\tBatch1\nsample2\tControl\tBatch1" > "${GROUPING_FILE}"
        echo "# NOTE: A proper grouping file with all sample IDs from your BAMs is required for meaningful analysis." >> "${GROUPING_FILE}"
    fi
    
    # Execute AltAnalyze with MultiPath-PSI algorithm
    # --species: Reference species (e.g., Hs for Homo sapiens, Mm for Mus musculus).
    # --platform RNASeq: Specifies RNA-Seq data as input.
    # --inputdir: Path to the directory containing input BAM files.
    # --outputdir: Path to the directory where results will be saved.
    # --runMultiPathPSI yes: Activates the MultiPath-PSI algorithm.
    # --grouping: Path to the grouping file for differential analysis.
    python AltAnalyze.py \
        --species Hs \
        --platform RNASeq \
        --inputdir "${INPUT_BAM_DIR}" \
        --outputdir "${OUTPUT_DIR}" \
        --runMultiPathPSI yes \
        --grouping "${GROUPING_FILE}"
  4. 4

    For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.

    kallisto v0.46.2 GitHub
    $ Bash example
    # Install kallisto (example using conda)
    # conda create -n kallisto_env kallisto=0.46.2 -c bioconda -c conda-forge
    # conda activate kallisto_env
    
    # Placeholder for reference transcriptome index
    # This index should be built once from a FASTA file of transcripts (e.g., from GENCODE or Ensembl).
    # Example command to build index (assuming human_gencode_vXX_transcripts.fasta.gz is available):
    # kallisto index -i human_gencode_vXX_transcriptome.idx human_gencode_vXX_transcripts.fasta.gz
    
    # Define variables
    TRANSCRIPTOME_INDEX="human_gencode_vXX_transcriptome.idx" # Placeholder for latest human GENCODE transcriptome index
    READS_R1="sample_R1.fastq.gz" # Placeholder for input forward reads
    READS_R2="sample_R2.fastq.gz" # Placeholder for input reverse reads
    OUTPUT_DIR="kallisto_quant_output"
    THREADS=8 # Number of threads to use
    
    # Create output directory if it doesn't exist
    mkdir -p "${OUTPUT_DIR}"
    
    # Run kallisto quantification for paired-end reads
    kallisto quant \
      -i "${TRANSCRIPTOME_INDEX}" \
      -o "${OUTPUT_DIR}" \
      -t "${THREADS}" \
      --bias \
      "${READS_R1}" \
      "${READS_R2}"

Tools Used

Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm). For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
Genome_build: hg19
Supplementary_files_format_and_content: PSI splicing events predicted from MultiPath-PSI
← Back to Analysis