← Back to search

HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence.

Molecular cell · 2023 · Vol. 83 (14) · pp. 2595-2611.e11

Abstract

RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression and, when dysfunctional, underlie human diseases. Proteome-wide discovery efforts predict thousands of RBP candidates, many of which lack canonical RNA-binding domains (RBDs). Here, we present a hybrid ensemble RBP classifier (HydRA), which leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machines (SVMs), convolutional neural networks (CNNs), and Transformer-based protein language models. Occlusion mapping by HydRA robustly detects known RBDs and predicts hundreds of uncharacterized RNA-binding associated domains. Enhanced CLIP (eCLIP) for HydRA-predicted RBP candidates reveals transcriptome-wide RNA targets and confirms RNA-binding activity for HydRA-predicted RNA-binding associated domains. HydRA accelerates construction of a comprehensive RBP catalog and expands the diversity of RNA-binding associated domains.

Publication Types

["Journal Article", "Research Support, N.I.H., Extramural", "Research Support, Non-U.S. Gov't"]

Keywords

MeSH Terms

["Animals", "Humans", "RNA", "Protein Binding", "Binding Sites", "Hydra", "Deep Learning"]

Funding

U24 HG009889 NHGRI NIH HHS (United States)
R01 HG004659 NHGRI NIH HHS (United States)
K22 NS112678 NINDS NIH HHS (United States)

Linked Datasets (1)

GSE221870 GSE via ncbi_elink
GEO

HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence

Homo sapiens
152 data files
FileTypeSize
100_to_225_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 153.0 MB
100_to_225_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 153.0 MB
25_to_100_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 72.2 MB
25_to_100_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 72.2 MB
50_to_125_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 104.2 MB
50_to_125_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 104.2 MB
75_to_150_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 130.8 MB
75_to_150_v5_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 130.8 MB
ACTN3_DD_rep1.ACTN3_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 46.6 MB
ACTN3_DD_rep1.ACTN3_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 46.6 MB
ACTN3_DD_rep1.ACTN3_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 53.2 MB
ACTN3_DD_rep1.ACTN3_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 53.2 MB
ACTN3_DD_rep2.ACTN3_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 46.4 MB
ACTN3_DD_rep2.ACTN3_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 46.4 MB
ACTN3_DD_rep2.ACTN3_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 85.3 MB
ACTN3_DD_rep2.ACTN3_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 85.3 MB
ACTN3_DD_rep3.ACTN3_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 53.7 MB
ACTN3_DD_rep3.ACTN3_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 53.7 MB
ACTN3_DD_rep3.ACTN3_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 61.9 MB
ACTN3_DD_rep3.ACTN3_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 61.9 MB
ACTN3_rep1.ACTN3_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 49.0 MB
ACTN3_rep1.ACTN3_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 49.0 MB
ACTN3_rep1.ACTN3_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 79.7 MB
ACTN3_rep1.ACTN3_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 79.7 MB
ACTN3_rep2.ACTN3_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 44.2 MB
ACTN3_rep2.ACTN3_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 44.2 MB
ACTN3_rep2.ACTN3_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 122.9 MB
ACTN3_rep2.ACTN3_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 122.9 MB
ACTN3_rep3.ACTN3_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 27.5 MB
ACTN3_rep3.ACTN3_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 27.5 MB
ACTN3_rep3.ACTN3_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 105.5 MB
ACTN3_rep3.ACTN3_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 105.5 MB
HSP90a_293T.13_XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.… OTHER 37.9 MB
HSP90a_293T.13_XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.… OTHER 37.9 MB
HSP90a_293T.13_XL1_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out… OTHER 129.9 MB
HSP90a_293T.13_XL1_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out… OTHER 129.9 MB
HSP90a_293T.13_XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.… OTHER 44.2 MB
HSP90a_293T.13_XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.… OTHER 44.2 MB
HSP90a_293T.13_XL2_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out… OTHER 106.9 MB
HSP90a_293T.13_XL2_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out… OTHER 106.9 MB
HSP90a_293T.pDEST_input.NIL.r1.fqTrTr.sorted.STARUnmapped.o… OTHER 35.6 MB
HSP90a_293T.pDEST_input.NIL.r1.fqTrTr.sorted.STARUnmapped.o… OTHER 35.6 MB
HSP90a_293T.pDEST_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 29.1 MB
HSP90a_293T.pDEST_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 29.1 MB
INO80B_DomainDeleted_rep1.INO80B_DD_CLIP1.umi.r1.fq.genome_… OTHER 31.5 MB
INO80B_DomainDeleted_rep1.INO80B_DD_CLIP1.umi.r1.fq.genome_… OTHER 31.5 MB
INO80B_DomainDeleted_rep1.INO80B_DD_INPUT1.umi.r1.fq.genome… OTHER 32.6 MB
INO80B_DomainDeleted_rep1.INO80B_DD_INPUT1.umi.r1.fq.genome… OTHER 32.6 MB
INO80B_DomainDeleted_rep2.INO80B_DD_CLIP2.umi.r1.fq.genome_… OTHER 8.4 MB
INO80B_DomainDeleted_rep2.INO80B_DD_CLIP2.umi.r1.fq.genome_… OTHER 8.4 MB
INO80B_DomainDeleted_rep2.INO80B_DD_INPUT2.umi.r1.fq.genome… OTHER 15.8 MB
INO80B_DomainDeleted_rep2.INO80B_DD_INPUT2.umi.r1.fq.genome… OTHER 15.8 MB
INO80B.INO80B_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 16.6 MB
INO80B.INO80B_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 16.6 MB
INO80B.INO80B_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 16.8 MB
INO80B.INO80B_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 16.8 MB
INO80B.INO80B_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 72.4 MB
INO80B.INO80B_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 72.4 MB
INO80B.INO80B_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 55.8 MB
INO80B.INO80B_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 55.8 MB
MCCC1_DD_rep1.MCCC1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 40.7 MB
MCCC1_DD_rep1.MCCC1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 40.7 MB
MCCC1_DD_rep1.MCCC1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 33.2 MB
MCCC1_DD_rep1.MCCC1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 33.2 MB
MCCC1_DD_rep2.MCCC1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 24.6 MB
MCCC1_DD_rep2.MCCC1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 24.6 MB
MCCC1_DD_rep2.MCCC1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 41.2 MB
MCCC1_DD_rep2.MCCC1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 41.2 MB
MCCC1_DD_rep3.MCCC1_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 19.4 MB
MCCC1_DD_rep3.MCCC1_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDup… OTHER 19.4 MB
MCCC1_DD_rep3.MCCC1_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 42.5 MB
MCCC1_DD_rep3.MCCC1_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDu… OTHER 42.5 MB
MCCC1_rep1.MCCC1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 84.8 MB
MCCC1_rep1.MCCC1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 84.8 MB
MCCC1_rep1.MCCC1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 101.9 MB
MCCC1_rep1.MCCC1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 101.9 MB
MCCC1_rep2.MCCC1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 78.0 MB
MCCC1_rep2.MCCC1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 78.0 MB
MCCC1_rep2.MCCC1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 119.1 MB
MCCC1_rep2.MCCC1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 119.1 MB
MCCC1_rep3.MCCC1_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 66.6 MB
MCCC1_rep3.MCCC1_CLIP3.umi.r1.fq.genome_mappedSoSo.rmDupSo.… OTHER 66.6 MB
MCCC1_rep3.MCCC1_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 117.2 MB
MCCC1_rep3.MCCC1_INPUT3.umi.r1.fq.genome_mappedSoSo.rmDupSo… OTHER 117.2 MB
NR5A1_DomainDeleted_rep1_redo20211215.NR5A1_DD_CLIP1.umi.r1… OTHER 12.4 MB
NR5A1_DomainDeleted_rep1_redo20211215.NR5A1_DD_CLIP1.umi.r1… OTHER 12.4 MB
NR5A1_DomainDeleted_rep1_redo20211215.NR5A1_DD_INPUT1.umi.r… OTHER 10.5 MB
NR5A1_DomainDeleted_rep1_redo20211215.NR5A1_DD_INPUT1.umi.r… OTHER 10.5 MB
NR5A1_DomainDeleted_rep2_redo20211215.NR5A1_DD_CLIP2.umi.r1… OTHER 6.0 MB
NR5A1_DomainDeleted_rep2_redo20211215.NR5A1_DD_CLIP2.umi.r1… OTHER 6.0 MB
NR5A1_DomainDeleted_rep2_redo20211215.NR5A1_DD_INPUT2.umi.r… OTHER 13.1 MB
NR5A1_DomainDeleted_rep2_redo20211215.NR5A1_DD_INPUT2.umi.r… OTHER 13.1 MB
NR5A1.NR5A1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 24.2 MB
NR5A1.NR5A1_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 24.2 MB
NR5A1.NR5A1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 19.4 MB
NR5A1.NR5A1_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 19.4 MB
NR5A1.NR5A1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 25.0 MB
NR5A1.NR5A1_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 25.0 MB
NR5A1.NR5A1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 31.0 MB
NR5A1.NR5A1_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 31.0 MB
PIAS4_DomainDeleted_rep1.PIAS4_DD_CLIP1.umi.r1.fq.genome_ma… OTHER 27.6 MB
PIAS4_DomainDeleted_rep1.PIAS4_DD_CLIP1.umi.r1.fq.genome_ma… OTHER 27.6 MB
PIAS4_DomainDeleted_rep1.PIAS4_DD_INPUT1.umi.r1.fq.genome_m… OTHER 24.9 MB
PIAS4_DomainDeleted_rep1.PIAS4_DD_INPUT1.umi.r1.fq.genome_m… OTHER 24.9 MB
PIAS4_DomainDeleted_rep2.PIAS4_DD_CLIP2.umi.r1.fq.genome_ma… OTHER 12.4 MB
PIAS4_DomainDeleted_rep2.PIAS4_DD_CLIP2.umi.r1.fq.genome_ma… OTHER 12.4 MB
PIAS4_DomainDeleted_rep2.PIAS4_DD_INPUT2.umi.r1.fq.genome_m… OTHER 13.0 MB
PIAS4_DomainDeleted_rep2.PIAS4_DD_INPUT2.umi.r1.fq.genome_m… OTHER 13.0 MB
PIAS4.PIAS4_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 76.4 MB
PIAS4.PIAS4_CLIP1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 76.4 MB
PIAS4.PIAS4_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 77.6 MB
PIAS4.PIAS4_CLIP2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 77.6 MB
PIAS4.PIAS4_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 74.9 MB
PIAS4.PIAS4_INPUT1.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 74.9 MB
PIAS4.PIAS4_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 87.8 MB
PIAS4.PIAS4_INPUT2.umi.r1.fq.genome_mappedSoSo.rmDupSo.bam OTHER 87.8 MB
YWHAE_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 32.3 MB
YWHAE_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 32.3 MB
YWHAE_293T.XL1_IP.A01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 157.8 MB
YWHAE_293T.XL1_IP.A01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 157.8 MB
YWHAE_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 21.4 MB
YWHAE_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 21.4 MB
YWHAE_293T.XL2_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 141.2 MB
YWHAE_293T.XL2_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 141.2 MB
YWHAG_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 40.6 MB
YWHAG_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 40.6 MB
YWHAG_293T.XL1_IP.A01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 233.7 MB
YWHAG_293T.XL1_IP.A01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 233.7 MB
YWHAG_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 37.1 MB
YWHAG_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 37.1 MB
YWHAG_293T.XL2_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 99.0 MB
YWHAG_293T.XL2_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 99.0 MB
YWHAH_293T.pcDNA6_input.NIL.r1.fqTrTr.sorted.STARUnmapped.o… OTHER 38.2 MB
YWHAH_293T.pcDNA6_input.NIL.r1.fqTrTr.sorted.STARUnmapped.o… OTHER 38.2 MB
YWHAH_293T.pcDNA6_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 89.8 MB
YWHAH_293T.pcDNA6_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 89.8 MB
YWHAH_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 33.6 MB
YWHAH_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 33.6 MB
YWHAH_293T.XL1_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 234.5 MB
YWHAH_293T.XL1_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 234.5 MB
YWHAH_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 41.5 MB
YWHAH_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 41.5 MB
YWHAH_293T.XL2_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 253.2 MB
YWHAH_293T.XL2_IP.C01.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 253.2 MB
YWHAZ_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 31.9 MB
YWHAZ_293T.XL1_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 31.9 MB
YWHAZ_293T.XL1_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 116.7 MB
YWHAZ_293T.XL1_IP.A03.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 116.7 MB
YWHAZ_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 36.5 MB
YWHAZ_293T.XL2_input.NIL.r1.fqTrTr.sorted.STARUnmapped.out.… OTHER 36.5 MB
YWHAZ_293T.XL2_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 135.3 MB
YWHAZ_293T.XL2_IP.A04.r1.fqTrTr.sorted.STARUnmapped.out.sor… OTHER 135.3 MB

Potentially Related Datasets (1)

These accessions were text-mined from the PMC full text. They may be referenced for comparison, cited from other studies, or otherwise mentioned without being primary data for this paper.

HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence

Analysis Pipelines (1)

eCLIP geo_data_processing GSE221870