← Back to search

Evaluation of novel computational methods to identify RNA-binding protein footprints from structural data.

RNA (New York, N.Y.) · 2025 · Vol. 31 (8) · pp. 1103-1124

Abstract

RNA-binding proteins (RBP) play diverse roles in mRNA processing and function. However, from thousands of RBPs encoded in the human genome, a detailed molecular understanding of their interactions with RNA is available only for a small fraction. In most cases, our knowledge of the combination of RNA sequence and structure required for specific RBP binding is insufficient for accurately predicting binding sites transcriptome-wide. In this context, the rapidly expanding collection of transcriptomic data sets that map distinct, yet intertwined posttranscriptional marks, such as RNA structure and RBP binding, presents an opportunity for integrative analysis to better characterize RBP binding. A grand challenge faced by our community is that relatively little information on the secondary structure context within and near RBP-binding sites has been gleaned from integrating such data sets, partially due to lack of suitable computational methods. To engage scientists from diverse backgrounds in addressing this gap, the RNA Society organized the RBP Footprint Grand Challenge in 2021, an international community effort to develop new methods or leverage existing ones for predicting RBP-binding sites through analysis of a growing volume of sequence, structure, and binding data and to experimentally validate select predictions. Here, we report the initiative, analyses, and methods developed by the participants, validation results, and five new in vivo binding data sets generated for validation. We hope our work will inspire additional innovation in computational methods, further utilization of available data resources, and future endeavors to engage the community in collaborating toward closing other critical data-analysis gaps.

Publication Types

["Journal Article"]

Keywords

MeSH Terms

["RNA-Binding Proteins", "Binding Sites", "Humans", "Computational Biology", "RNA", "Nucleic Acid Conformation", "Protein Binding", "Transcriptome", "RNA, Messenger"]

Funding

R21 GM148835 NIGMS NIH HHS (United States)

Linked Datasets (1)

GSE262542 GSE via ncbi_elink
GEO

Evaluation of novel computational methods that identify RNA-binding protein footprints from structural data

Homo sapiens
40 data files
FileTypeSize
hnRNPA2B1_HepG2_IN1.fastq.gz RNA-Seq 1.1 GB
hnRNPA2B1_HepG2_IN1.fastq.gz RNA-Seq 1.1 GB
hnRNPA2B1_HepG2_IN2.fastq.gz RNA-Seq 1.3 GB
hnRNPA2B1_HepG2_IN2.fastq.gz RNA-Seq 1.3 GB
hnRNPA2B1_HepG2_IP1.fastq.gz RNA-Seq 1.4 GB
hnRNPA2B1_HepG2_IP1.fastq.gz RNA-Seq 1.4 GB
hnRNPA2B1_HepG2_IP2.fastq.gz RNA-Seq 1.5 GB
hnRNPA2B1_HepG2_IP2.fastq.gz RNA-Seq 1.5 GB
HNRNPC_K562_In1.fastq.gz RNA-Seq 880.9 MB
HNRNPC_K562_In1.fastq.gz RNA-Seq 880.9 MB
HNRNPC_K562_In2.fastq.gz RNA-Seq 890.3 MB
HNRNPC_K562_In2.fastq.gz RNA-Seq 890.3 MB
HNRNPC_K562_IP1.fastq.gz RNA-Seq 829.7 MB
HNRNPC_K562_IP1.fastq.gz RNA-Seq 829.7 MB
HNRNPC_K562_IP2.fastq.gz RNA-Seq 795.7 MB
HNRNPC_K562_IP2.fastq.gz RNA-Seq 795.7 MB
PRPF17_Hek293T_In1.fastq.gz RNA-Seq 5.1 GB
PRPF17_Hek293T_In1.fastq.gz RNA-Seq 5.1 GB
PRPF17_Hek293T_In2.fastq.gz RNA-Seq 602.2 MB
PRPF17_Hek293T_In2.fastq.gz RNA-Seq 602.2 MB
PRPF17_Hek293T_IP1.fastq.gz RNA-Seq 682.7 MB
PRPF17_Hek293T_IP1.fastq.gz RNA-Seq 682.7 MB
PRPF17_Hek293T_IP2.fastq.gz RNA-Seq 567.9 MB
PRPF17_Hek293T_IP2.fastq.gz RNA-Seq 567.9 MB
PRPF17_HeLa_In1.fastq.gz RNA-Seq 466.0 MB
PRPF17_HeLa_In1.fastq.gz RNA-Seq 466.0 MB
PRPF17_HeLa_In2.fastq.gz RNA-Seq 524.0 MB
PRPF17_HeLa_In2.fastq.gz RNA-Seq 524.0 MB
PRPF17_HeLa_IP1.fastq.gz RNA-Seq 991.2 MB
PRPF17_HeLa_IP1.fastq.gz RNA-Seq 991.2 MB
PRPF17_HeLa_IP2.fastq.gz RNA-Seq 1.0 GB
PRPF17_HeLa_IP2.fastq.gz RNA-Seq 1.0 GB
SND1_k562_IN1.fastq.gz RNA-Seq 952.9 MB
SND1_k562_IN1.fastq.gz RNA-Seq 952.9 MB
SND1_k562_IN2.fastq.gz RNA-Seq 1.4 GB
SND1_k562_IN2.fastq.gz RNA-Seq 1.4 GB
SND1_k562_IP1.fastq.gz RNA-Seq 1.3 GB
SND1_k562_IP1.fastq.gz RNA-Seq 1.3 GB
SND1_k562_IP2.fastq.gz RNA-Seq 1.2 GB
SND1_k562_IP2.fastq.gz RNA-Seq 1.2 GB

Analysis Pipelines (1)

eCLIP geo_data_processing GSE262542