美国哥伦比亚大学Mohammed AlQuraishi等研究人员合作发现,AlphaFold2的再训练对其学习机制和泛化能力提供新见解。这一研究成果于2024年5月14日在线发表在国际学术期刊《自然—方法学》上。
研究人员报告了OpenFold,它是AlphaFold2的一个快速、内存高效且可训练的实现。研究人员从头开始训练OpenFold,使其与AlphaFold2的准确性相匹配。在建立了对等之后,研究人员发现OpenFold即使在训练集的规模和多样性受到刻意限制的情况下,在泛化方面也具有显著的鲁棒性,包括二级结构元素类别的近乎完全消除。通过分析训练过程中产生的中间结构,研究人员还深入了解了OpenFold学习折叠的分层方式。总之,该研究证明了OpenFold的强大功能和实用性,研究人员相信它将成为蛋白质建模界的重要资源。
据介绍,AlphaFold2能够以极高的准确度预测蛋白质结构,从而彻底改变了结构生物学。然而,它的实现缺乏训练新模型所需的代码和数据。这些对于(1)处理新任务(如蛋白质配体复合物结构预测),(2)研究模型的学习过程以及(3)评估模型泛化到折叠空间未见区域的能力都是必要的。
附:英文原文
Title: OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
Author: Ahdritz, Gustaf, Bouatta, Nazim, Floristean, Christina, Kadyan, Sachin, Xia, Qinghui, Gerecke, William, ODonnell, Timothy J., Berenberg, Daniel, Fisk, Ian, Zanichelli, Niccol, Zhang, Bo, Nowaczynski, Arkadiusz, Wang, Bei, Stepniewska-Dziubinska, Marta M., Zhang, Shang, Ojewole, Adegoke, Guney, Murat Efe, Biderman, Stella, Watkins, Andrew M., Ra, Stephen, Lorenzo, Pablo Ribalta, Nivon, Lucas, Weitzner, Brian, Ban, Yih-En Andrew, Chen, Shiyang, Zhang, Minjia, Li, Conglong, Song, Shuaiwen Leon, He, Yuxiong, Sorger, Peter K., Mostaque, Emad, Zhang, Zhao, Bonneau, Richard, AlQuraishi, Mohammed
Issue&Volume: 2024-05-14
Abstract: AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein–ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model’s capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
DOI: 10.1038/s41592-024-02272-z
Source: https://www.nature.com/articles/s41592-024-02272-z
Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex