英国阿斯利康公司Dimitrios Vitsios等研究人员合作发现,多组学和生物标志物的疾病预测,促进英国生物样本库中的病例对照遗传学发现。该研究于2024年9月11日在线发表于国际一流学术期刊《自然—遗传学》。
研究人员提出了一个集成机器学习框架(基于表型关联的机器学习,MILTON),利用多种生物标志物预测英国生物样本库中3213种疾病。借助英国生物样本库的纵向健康记录数据,MILTON能够预测在招募时未被诊断的发病病例,其预测性能大大优于现有的多基因风险评分。
研究人员还展示了MILTON在增强遗传关联分析中的应用,特别是在484230个基因组测序样本,和46327个具有匹配血浆蛋白质组数据的样本的,广泛表型关联研究中。这提高了88个已知基因-疾病关系(P<1×10−8)的信号,并揭示了182个在未增强的基线队列中,未达到全基因组显著性的基因-疾病关系。
研究人员在FinnGen生物样本库,以及两个用于基因-疾病优先级排序的正交机器学习方法中,验证了这些发现。所有提取的基因-疾病关联和预测疾病的生物标志物,均已公开提供(http://milton.public.cgr.astrazeneca.com)。
据了解,生物样本库级别的数据集的出现,为发现新型生物标志物和开发人类疾病预测算法提供了新的机会。
附:英文原文
Title: Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank
Author: Garg, Manik, Karpinski, Marcin, Matelska, Dorota, Middleton, Lawrence, Burren, Oliver S., Hu, Fengyuan, Wheeler, Eleanor, Smith, Katherine R., Fabre, Margarete A., Mitchell, Jonathan, ONeill, Amanda, Ashley, Euan A., Harper, Andrew R., Wang, Quanli, Dhindsa, Ryan S., Petrovski, Slav, Vitsios, Dimitrios
Issue&Volume: 2024-09-11
Abstract: The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank’s longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P<1×108) gene–disease relationships alongside 182 gene–disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene–disease prioritization. All extracted gene–disease associations and incident disease predictive biomarkers are publicly available (http://milton.public.cgr.astrazeneca.com).
DOI: 10.1038/s41588-024-01898-1
Source: https://www.nature.com/articles/s41588-024-01898-1
Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:41.307
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex