课题组人员克服了这一挑战,通过从30 PB的未组装基因组数据中获得7倍深度的多序列比对来增强共同进化信号,并开发了一个新的DL网络,该网络训练了来自2亿个预测蛋白质结构的域-域相互作用的增强数据集。小组系统地筛选了2亿对人类蛋白质对,预测了17849种相互作用,预期精度为90%,其中3631种相互作用在以前的实验筛选中未被确定。这些预测相互作用的三维模型提供了关于蛋白质功能和人类疾病机制的许多假设。
据悉,蛋白质-蛋白质相互作用(PPI)对生物功能至关重要。协同进化分析和基于深度学习(DL)的蛋白质结构预测已经能够在细菌和酵母中进行全面的PPI鉴定,但这些方法在更复杂的人类蛋白质组中取得的成功有限。
附:英文原文
Title: Predicting protein-protein interactions in the human proteome
Author: Jing Zhang, Ian R. Humphreys, Jimin Pei, Jinuk Kim, Chulwon Choi, Rongqing Yuan, Jesse Durham, Siqi Liu, Hee-Jung Choi, Minkyung Baek, David Baker, Qian Cong
Issue&Volume: 2025-09-25
Abstract: Protein-protein interactions (PPI) are essential for biological function. Coevolutionary analysis and deep learning (DL) based protein structure prediction have enabled comprehensive PPI identification in bacteria and yeast, but these approaches have had limited success for the more complex human proteome. We overcame this challenge by enhancing the coevolutionary signals with 7-fold deeper multiple sequence alignments harvested from 30 petabytes of unassembled genomic data and developing a new DL network trained on augmented datasets of domain-domain interactions from 200 million predicted protein structures. We systematically screened 200 million human protein pairs and predicted 17,849 interactions with an expected precision of 90%, of which 3,631 interactions were not identified in previous experimental screens. Three-dimensional models of these predicted interactions provide numerous hypotheses about protein function and mechanisms of human diseases.
DOI: adt1630
Source: https://www.science.org/doi/10.1126/science.adt1630