研究人员展示了具有共享功能的蛋白质共享氨基酸序列代码,这些代码指导它们到达各自的亚细胞区室。研究人员开发了一种蛋白质语言模型ProtGPS,该模型能够高效地预测人类蛋白质(不包括在训练集中的蛋白质)的区室定位。
ProtGPS成功地指导了新型蛋白质序列的生成,这些序列能够选择性地在核仁中组装。ProtGPS还识别出了改变这一代码的致病突变,从而导致蛋白质的亚细胞定位发生变化。
这些结果表明,蛋白质序列不仅包含折叠代码,还包含一种先前未被识别的代码,指导它们分布到不同的亚细胞区室。
据悉,细胞已经演化出机制,将大约100亿个蛋白质分配到不同的亚细胞区室,这些蛋白质在共同的功能中发挥作用并必须组装在一起。
附:英文原文
Title: Protein codes promote selective subcellular compartmentalization
Author: Henry R. Kilgore, Itamar Chinn, Peter G. Mikhael, Ilan Mitnikov, Catherine Van Dongen, Guy Zylberberg, Lena Afeyan, Salman F. Banani, Susana Wilson-Hawken, Tong Ihn Lee, Regina Barzilay, Richard A. Young
Issue&Volume: 2025-02-06
Abstract: Cells have evolved mechanisms to distribute ~10 billion protein molecules to subcellular compartments where diverse proteins involved in shared functions must assemble. Here, we demonstrate that proteins with shared functions share amino acid sequence codes that guide them to compartment destinations. A protein language model, ProtGPS, was developed that predicts with high performance the compartment localization of human proteins excluded from the training set. ProtGPS successfully guided generation of novel protein sequences that selectively assemble in the nucleolus. ProtGPS identified pathological mutations that change this code and lead to altered subcellular localization of proteins. Our results indicate that protein sequences contain not only a folding code, but also a previously unrecognized code governing their distribution to diverse subcellular compartments.
DOI: adq2634
Source: https://www.science.org/doi/10.1126/science.adq2634