清华大学张数一团队近期取得重要工作进展,他们研究开发了EvoAI策略,能够极大地压缩和重建蛋白质序列空间。相关研究成果2024年11月11日在线发表于《自然—方法学》杂志上。
据介绍,设计具有改进功能的蛋白质需要深入了解序列和功能之间的关系,这是一个很难探索的广阔领域。通过识别功能上重要的特征来有效压缩这个空间的能力是非常有价值的。
研究人员建立了一种称为EvoScan的方法来全面分割和扫描高适应度序列空间,以获得捕捉其基本特征的锚点,特别是在高维度上。这一方法与任何可以与转录输出偶联的生物分子功能兼容。
研究人员还开发了深度学习和大型语言模型,以从这些锚点中准确重建空间,从而可以在没有先前同源性或结构信息的情况下计算预测新的、高度拟合的序列。
研究人员将这种混合实验-计算方法(称之为EvoAI)应用于阻遏蛋白,发现只有82个锚点足以压缩高适应度序列空间,压缩比为1048。空间的极端压缩性为应用生物分子设计,和对自然演化的理解提供了信息。
附:英文原文
Title: EvoAI enables extreme compression and reconstruction of the protein sequence space
Author: Ma, Ziyuan, Li, Wenjie, Shen, Yunhao, Xu, Yunxin, Liu, Gengjiang, Chang, Jiamin, Li, Zeju, Qin, Hong, Tian, Boxue, Gong, Haipeng, Liu, David R., Thuronyi, B. W., Voigt, Christopher A., Zhang, Shuyi
Issue&Volume: 2024-11-11
Abstract: Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here we establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental–computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 1048. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.
DOI: 10.1038/s41592-024-02504-2
Source: https://www.nature.com/articles/s41592-024-02504-2
Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex