GSE175886 Processing Pipeline
Publication
Gain-of-function cardiomyopathic mutations in RBM20 rewire splicing regulation and re-distribute ribonucleoprotein granules within processing bodies.Nature communications (2021) — PMID 34732726
Dataset
GSE175886RNA-Seq of isogenic human iPS cell-derived cardiomyocytes with RBM20 mutations created by genome editing
Processing Steps
Generate Jupyter Notebook-
1
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR.
$ Bash example
# Install STAR (example using conda) # conda install -c bioconda star=2.4.2a # Build STAR genome index (if not already built) # This step would typically be done once for a given genome/annotation combination. # The description implies the index for hg19 and Ensembl 72 is already available. # Example command to build the index (replace paths and number of threads): # STAR --runMode genomeGenerate \ # --genomeDir /path/to/STAR_hg19_Ensembl72_index \ # --genomeFastaFiles /path/to/hg19.fa \ # --sjdbGTFfile /path/to/Ensembl72.gtf \ # --sjdbOverhang 100 \ # --runThreadN 8 # Align RNA-Seq data using STAR # Replace /path/to/STAR_hg19_Ensembl72_index with the actual path to your genome index. # Replace read1.fastq.gz and read2.fastq.gz with your input FASTQ files. # Adjust --runThreadN based on available CPU cores. STAR --genomeDir /path/to/STAR_hg19_Ensembl72_index \ --readFilesIn read1.fastq.gz read2.fastq.gz \ --runThreadN 8 \ --outFileNamePrefix aligned_reads_ \ --outSAMtype BAM SortedByCoordinate \ --outSAMattributes Standard \ --readFilesCommand zcat \ --outFilterType BySJout \ --outFilterMultimapNmax 20 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverLmax 0.04 \ --alignIntronMin 20 \ --alignIntronMax 1000000 \ --alignMatesGapMax 1000000 \ --limitBAMsortRAM 30000000000 # Example: 30GB RAM for sorting -
2
To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI.
RNA-seq v1.0.0$ Bash example
# Install MultiPath-PSI # pip install multipath-psi # Define reference genome and annotation (using latest human assembly as placeholder) GENOME_FA="GRCh38.primary_assembly.genome.fa" # Placeholder for human reference genome GTF_FILE="gencode.v44.annotation.gtf" # Placeholder for GENCODE annotation INDEX_DIR="multipath_psi_index" OUTPUT_DIR="multipath_psi_output" # Download reference files (example, replace with actual download commands if needed) # wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz # gunzip GRCh38.primary_assembly.genome.fa.gz # wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gtf.gz # gunzip gencode.v44.annotation.gtf.gz # Create index for MultiPath-PSI # This step requires a genome FASTA and a GTF/GFF annotation file. mkdir -p "${INDEX_DIR}" multipath-psi build -g "${GENOME_FA}" -a "${GTF_FILE}" -o "${INDEX_DIR}" # Example RNA-Seq BAM files (replace with actual input files from iPSC-CM RNA-Seq) INPUT_BAM_1="ipsc_cm_sample1.bam" INPUT_BAM_2="ipsc_cm_sample2.bam" # Create output directory mkdir -p "${OUTPUT_DIR}" # Run MultiPath-PSI quantification # This command quantifies alternative splicing events (PSI values) from RNA-Seq BAM files. multipath-psi quant -i "${INDEX_DIR}" -o "${OUTPUT_DIR}" "${INPUT_BAM_1}" "${INPUT_BAM_2}" -
3
MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm).
$ Bash example
# Install AltAnalyze (if not already installed) # pip install AltAnalyze # Assuming AltAnalyze.py is in your PATH or current directory # Create a directory for input BAM files and copy your aligned BAMs into it # mkdir -p /path/to/input_bams # cp your_aligned_file1.bam /path/to/input_bams/ # cp your_aligned_file2.bam /path/to/input_bams/ # Run AltAnalyze for splicing analysis, which includes MultiPath-PSI. # --species: Specify the species (e.g., Hs for Homo sapiens, Mm for Mus musculus). # --array_type: Specify the assay type (e.g., RNASeq). # --input_files: Path to a directory containing aligned BAM files. # --output_dir: Directory to store AltAnalyze results. python AltAnalyze.py \ --run_splicing \ --species Hs \ --array_type RNASeq \ --input_files /path/to/input_bams \ --output_dir altanalyze_multipath_psi_output -
4
For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used.
$ Bash example
# Install kallisto (example using conda) # conda install -c bioconda kallisto # Placeholder for kallisto index creation # Replace 'transcripts.fasta' with your actual transcriptome FASTA file (e.g., from Ensembl, GENCODE, RefSeq) # Replace 'human_transcriptome.idx' with your desired index name # kallisto index -i human_transcriptome.idx transcripts.fasta # Example kallisto quantification command for paired-end reads # Replace 'human_transcriptome.idx' with your actual kallisto index # Replace 'read1.fastq.gz' and 'read2.fastq.gz' with your actual input FASTQ files # Replace 'output_dir' with your desired output directory kallisto quant -i human_transcriptome.idx -o output_dir read1.fastq.gz read2.fastq.gz # For single-end reads, use the --single flag and specify fragment length and standard deviation: # kallisto quant -i human_transcriptome.idx -o output_dir --single -l 200 -s 20 read1.fastq.gz
Raw Source Text
RNA-Seq data was aligned to the human hg19 reference genome and transcriptome (Ensembl 72) using the software STAR. To accurately detect diverse alternative splicing events in iPSC-CM RNA-Seq, we adapted a recently developed Percent Spliced In (PSI) splicing method called MultiPath-PSI. MultiPath-PSI is available through AltAnalyze version 2.1.1, requiring aligned BAM files as input and was extensively benchmarked against other local splicing variation approaches (http://altanalyze.readthedocs.io/en/latest/Algorithms/#multipath-psi-splicing-algorithm). For gene expression Quantification Kallisto (https://pachterlab.github.io/kallisto/) was used. Genome_build: hg19 Supplementary_files_format_and_content: Kallisto-gene expression Quantification data