当前位置:科学网首页 > 小柯机器人 >详情
通过深度网络幻想从头设计蛋白质
作者:小柯机器人 发布时间:2021/12/5 13:10:17

美国华盛顿大学David Baker团队经过不懈努力,实现通过深度网络幻想从头设计蛋白质。该项研究成果于2021年12月1日在线发表在《自然》杂志上。

研究人员首先鉴定目前的深度神经网络技术能否获取足够丰富的信息,以生成新的折叠蛋白质,新生成的蛋白序列与用于训练模型的天然蛋白质的序列无关。他们生成随机氨基酸序列,并将序列输入trRosetta 结构预测网络中,预测起始残基-残基距离图,正如预期的一样,生成的距离图毫无特征。然后他们在氨基酸序列空间中进行蒙特卡罗采样(Monte Carlo sampling),优化网络预测的残基间距离分布与所有蛋白质平均背景分布之间的差异(Kullback-Leibler divergence)。从不同的随机起点进行优化产生了涵盖广泛序列和预测结构的新型蛋白质。

研究人员获得了129个网络“幻想”序列的合成基因,并将这些基因的编码蛋白在大肠杆菌(Escherichia coli)中进行表达和纯化。其中27种蛋白质产生了单分散物种,其圆二色光谱与幻想结构一致。研究人员鉴定出了三种幻想蛋白质的三维结构,其中两种通过 X 射线晶体衍射分析法,一种通过核磁共振法,鉴定出的结构与幻想模型非常吻合。因此,用来预测天然蛋白质序列结构的深度网络也可以反过来用以设计新蛋白质,这些网络和方法应该与传统的基于物理学的模型一起,为具有新功能的蛋白质的从头设计做出贡献。

据了解,在使用深度神经网络技术预测氨基酸残基之间距离的蛋白质结构预测方面,目前已经取得了相当大的进展。

附:英文原文

Title: De novo protein design by deep network hallucination

Author: Anishchenko, Ivan, Pellock, Samuel J., Chidyausiku, Tamuka M., Ramelot, Theresa A., Ovchinnikov, Sergey, Hao, Jingzhou, Bafna, Khushboo, Norn, Christoffer, Kang, Alex, Bera, Asim K., DiMaio, Frank, Carter, Lauren, Chow, Cameron M., Montelione, Gaetano T., Baker, David

Issue&Volume: 2021-12-01

Abstract: There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1,2,3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue–residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback–Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-‘hallucinated’ sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.

DOI: 10.1038/s41586-021-04184-w

Source: https://www.nature.com/articles/s41586-021-04184-w

期刊信息

Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:43.07
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html