微软研究院Kevin K. Yang小组探究了神经网络生成酶的计算评分和实验评估。相关论文于2024年4月23日发表在《自然—生物技术》杂志上。
该研究团队评估了一组20种不同的计算指标,以评估由三种不同的生成模型产生的酶序列的质量——祖先序列重建,生成对抗网络和蛋白质语言模型。在两个酶家族的基础上,研究团队表达和纯化了500多个天然序列,并生成了与最相似的天然序列具有70-90%一致性的序列,以作为预测体外酶活性的基准计算指标。
经过三轮实验,小组开发了一个计算过滤器,将实验成功率提高了50-150%。所提出的指标和模型将通过作为生成蛋白序列模型的基准,和帮助选择活性变体进行实验测试,来推动蛋白质工程研究。
据悉,近年来,生成蛋白序列模型已经发展到对新序列进行采样。然而,预测生成的蛋白质是否会折叠和发挥功能仍然具有挑战性。
附:英文原文
Title: Computational scoring and experimental evaluation of enzymes generated by neural networks
Author: Johnson, Sean R., Fu, Xiaozhi, Viknander, Sandra, Goldin, Clara, Monaco, Sarah, Zelezniak, Aleksej, Yang, Kevin K.
Issue&Volume: 2024-04-23
Abstract: In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70–90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50–150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.
DOI: 10.1038/s41587-024-02214-2
Source: https://www.nature.com/articles/s41587-024-02214-2
Nature Biotechnology:《自然—生物技术》,创刊于1996年。隶属于施普林格·自然出版集团,最新IF:68.164
官方网址:https://www.nature.com/nbt/
投稿链接:https://mts-nbt.nature.com/cgi-bin/main.plex