在这项研究中,该团队开发了互补的机器学习方法,从基因组背景、蛋白质序列或它们的组合中系统地预测抗菌体功能,达到了99%的精度和92%的召回率。课题组研究人员在埃希氏菌和链霉菌中实验验证了这些模型,发现了12个噬菌体系统。这些模型应用于超过32,000个细菌基因组,扩大了预测的抗菌体库,约1.5%的细菌基因组致力于防御,超过85%的预测蛋白质家族仍未表征。研究人员提供了一个超过19,000个候选操纵子家族的交互式目录,用于实验随访。总之,这些发现表明细菌免疫的大多数分子多样性仍未被表征,并为其系统探索提供了基础。
据介绍,细菌泛基因组包含大量不同种类的抗噬菌体系统,其总体范围尚不清楚。
附:英文原文
Title: Protein and genomic language models uncover the unexplored diversity of bacterial immunity
Author: Ernest Mordret, Alexandre Hervé, Florian Tesson, Hugo Vaysset, Tyler Clabby, Arthur Loubat, Helena Shomar, Remi Planel, Rachel Lavenir, Jean Cury, Aude Bernheim
Issue&Volume: 2026-04-02
Abstract: The bacterial pangenome contains a vast diversity of antiphage systems, whose overall extent is still unknown. In this study, we developed complementary machine learning approaches to systematically predict antiphage function from genomic context, protein sequence, or their combination, achieving up to 99% precision and 92% recall. We validated these models experimentally in Escherichia and Streptomyces with the discovery of 12 antiphage systems. Applied to over 32,000 bacterial genomes, these models expand the predicted antiphage repertoire, with ~1.5% of bacterial genomes devoted to defense and more than 85% of predicted protein families remaining uncharacterized. We provide an interactive catalog of more than 19,000 candidate operon families for experimental follow-up. Together, these findings show that most molecular diversity in bacterial immunity remains uncharacterized and provide a foundation for its systematic exploration.
DOI: adv8275
Source: https://www.science.org/doi/10.1126/science.adv8275
