中国农业科学院左二伟研究小组宣布他们开发出了AlphaCD:一种能够高度准确表征21335胞苷脱氨酶的机器学习模型。2025年8月18日出版的《细胞研究》杂志发表了这项成果。
该研究团队通过实验表征了HEK293T细胞中1100种载脂蛋白B mRNA编辑酶、nCas9介导的催化多肽(APOBEC)样家族胞苷脱氨酶(CDs)的催化效率、靶位窗口、基元偏好和脱靶活性,从而生成了迄今为止单个蛋白家族实验验证功能的最大数据集。这些数据与氨基酸序列、三维结构和8个附加特征一起构建了机器学习(ML)模型AlphaCD,该模型在预测催化效率(0.92)和脱靶活性(0.84)、目标窗口(0.73)和催化基序(0.78)方面具有很高的准确性。
该团队应用训练好的模型预测了Uniprot中21335个CD的上述催化特征,并对28个CD进行了抽样,进一步验证了其预测精度(分别为0.84、0.87、0.75、0.73)。然后,利用基于丙氨酸扫描的诱变技术减少了一个例子CD中的脱靶,产生了一个非常高保真度、高效率的胞嘧啶碱基编辑器,证明了AlphaCD在高精度、高通量蛋白质功能表征中的应用,并为其他蛋白质的加速表征提供了一种策略。
据了解,序列数据库中范围广泛但支持证据有限,阻碍了具有特定功能的蛋白质的鉴定。
附:英文原文
Title: AlphaCD: a machine learning model capable of highly accurate characterization for 21,335 cytidine deaminases
Author: Xu, Kui, Hua, Guoying, Wu, Mingdi, Zhang, Haihang, Liu, Jingda, Feng, Hu, Zuo, Erwei
Issue&Volume: 2025-08-18
Abstract: The vast scope but limited-supporting evidence in sequence databases hinders identification of proteins with specific functionality. Here, we experimentally characterized catalytic efficiency, target site window, motif preference, and off-target activity of 1100 apolipoprotein B mRNA-editing enzyme, catalytic polypeptide (APOBEC)-like family cytidine deaminases (CDs) fused with nCas9 in HEK293T cells, thereby generating the largest dataset of experimentally validated functions for a single protein family to date. These data, together with amino acid sequence, three-dimensional structure, and eight additional features, were used to construct a machine learning (ML) model, AlphaCD, which showed high accuracy in predicting catalytic efficiency (0.92) and off-target activity (0.84), as well as target windows (0.73) and catalytic motifs (0.78). We applied the trained model to predict the above catalytic features of 21,335 CDs in Uniprot, and subsampling of 28 CDs further validated its prediction accuracy (0.84, 0.87, 0.75, 0.73, respectively). Alanine scanning-based mutagenesis was then employed to reduce off-targets in one example CD, which produced a remarkably high fidelity, high efficiency cytosine base editor, thus demonstrating AlphaCD application in high-accuracy, high-throughput protein functional characterization, and providing a strategy for accelerated characterization of other proteins.
DOI: 10.1038/s41422-025-01164-x
Source: https://www.nature.com/articles/s41422-025-01164-x
Cell Research:《细胞研究》,创刊于1990年。隶属于施普林格·自然出版集团,最新IF:20.057
官方网址:https://www.nature.com/cr/
投稿链接:https://mts-cr.nature.com/cgi-bin/main.plex