GSE175886 Processing Pipeline

GSE code_examples 4 steps

Publication

Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.

Nature communications (2021) — PMID 34732726

Dataset

GSE175886

RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing

Warning: Pipeline descriptions and code snippets may be inferred or AI-generated. Use them only as a starting point to guide analysis, and validate before use.
  1. 1

    RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.

    $ Bash example
    # Install STAR (example using conda)
    # conda install -c bioconda star=2.4.2a
    
    # Build STAR genome index (if not already built)
    # This step would typically be done once for a given genome/annotation combination.
    # The description implies the index for hg19 and Ensembl 72 is already available.
    # Example command to build the index (replace paths and number of threads):
    # STAR --runMode genomeGenerate \
    #      --genomeDir /path/to/STAR_hg19_Ensembl72_index \
    #      --genomeFastaFiles /path/to/hg19.fa \
    #      --sjdbGTFfile /path/to/Ensembl72.gtf \
    #      --sjdbOverhang 100 \
    #      --runThreadN 8
    
    # Align RNA-Seq data using STAR
    # Replace /path/to/STAR_hg19_Ensembl72_index with the actual path to your genome index.
    # Replace read1.fastq.gz and read2.fastq.gz with your input FASTQ files.
    # Adjust --runThreadN based on available CPU cores.
    STAR --genomeDir /path/to/STAR_hg19_Ensembl72_index \
         --readFilesIn read1.fastq.gz read2.fastq.gz \
         --runThreadN 8 \
         --outFileNamePrefix aligned_reads_ \
         --outSAMtype BAM SortedByCoordinate \
         --outSAMattributes Standard \
         --readFilesCommand zcat \
         --outFilterType BySJout \
         --outFilterMultimapNmax 20 \
         --outFilterMismatchNmax 999 \
         --outFilterMismatchNoverLmax 0.04 \
         --alignIntronMin 20 \
         --alignIntronMax 1000000 \
         --alignMatesGapMax 1000000 \
         --limitBAMsortRAM 30000000000 # Example: 30GB RAM for sorting
    
  2. 2

    To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.

    RNA-seq v1.0.0
    $ Bash example
    # Install MultiPath-PSI
    # pip install multipath-psi
    
    # Define reference genome and annotation (using latest human assembly as placeholder)
    GENOME_FA="GRCh38.primary_assembly.genome.fa" # Placeholder for human reference genome
    GTF_FILE="gencode.v44.annotation.gtf"       # Placeholder for GENCODE annotation
    INDEX_DIR="multipath_psi_index"
    OUTPUT_DIR="multipath_psi_output"
    
    # Download reference files (example, replace with actual download commands if needed)
    # wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz
    # gunzip GRCh38.primary_assembly.genome.fa.gz
    # wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz
    # gunzip gencode.v44.annotation.gtf.gz
    
    # Create index for MultiPath-PSI
    # This step requires a genome FASTA and a GTF/GFF annotation file.
    mkdir -p "${INDEX_DIR}"
    multipath-psi build -g "${GENOME_FA}" -a "${GTF_FILE}" -o "${INDEX_DIR}"
    
    # Example RNA-Seq BAM files (replace with actual input files from iPSC-CM RNA-Seq)
    INPUT_BAM_1="ipsc_cm_sample1.bam"
    INPUT_BAM_2="ipsc_cm_sample2.bam"
    
    # Create output directory
    mkdir -p "${OUTPUT_DIR}"
    
    # Run MultiPath-PSI quantification
    # This command quantifies alternative splicing events (PSI values) from RNA-Seq BAM files.
    multipath-psi quant -i "${INDEX_DIR}" -o "${OUTPUT_DIR}" "${INPUT_BAM_1}" "${INPUT_BAM_2}"
  3. 3

    MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).

    AltAnalyze v2.1.1 GitHub
    $ Bash example
    # Install AltAnalyze (if not already installed)
    # pip install AltAnalyze
    
    # Assuming AltAnalyze.py is in your PATH or current directory
    # Create a directory for input BAM files and copy your aligned BAMs into it
    # mkdir -p /path/to/input_bams
    # cp your_aligned_file1.bam /path/to/input_bams/
    # cp your_aligned_file2.bam /path/to/input_bams/
    
    # Run AltAnalyze for splicing analysis, which includes MultiPath-PSI.
    # --species: Specify the species (e.g., Hs for Homo sapiens, Mm for Mus musculus).
    # --array_type: Specify the assay type (e.g., RNASeq).
    # --input_files: Path to a directory containing aligned BAM files.
    # --output_dir: Directory to store AltAnalyze results.
    python AltAnalyze.py \
        --run_splicing \
        --species Hs \
        --array_type RNASeq \
        --input_files /path/to/input_bams \
        --output_dir altanalyze_multipath_psi_output
  4. 4

    For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.

    kallisto vNot specified GitHub
    $ Bash example
    # Install kallisto (example using conda)
    # conda install -c bioconda kallisto
    
    # Placeholder for kallisto index creation
    # Replace 'transcripts.fasta' with your actual transcriptome FASTA file (e.g., from Ensembl, GENCODE, RefSeq)
    # Replace 'human_transcriptome.idx' with your desired index name
    # kallisto index -i human_transcriptome.idx transcripts.fasta
    
    # Example kallisto quantification command for paired-end reads
    # Replace 'human_transcriptome.idx' with your actual kallisto index
    # Replace 'read1.fastq.gz' and 'read2.fastq.gz' with your actual input FASTQ files
    # Replace 'output_dir' with your desired output directory
    kallisto quant -i human_transcriptome.idx -o output_dir read1.fastq.gz read2.fastq.gz
    
    # For single-end reads, use the --single flag and specify fragment length and standard deviation:
    # kallisto quant -i human_transcriptome.idx -o output_dir --single -l 200 -s 20 read1.fastq.gz

Tools Used

Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
Genome_build: hg19
Supplementary_files_format_and_content: Kallisto-gene expression Quantification data
← Back to Analysis