多种机器学习和统计模型预测个体患者临床风险并不一致—小柯机器人

首页 | 新闻 | 博客 | 院士 | 人才 | 会议 | 基金·项目 | 论文 | 视频·直播 | 小柯机器人 | 医学科普

当前位置：科学网首页 > 小柯机器人 >详情

多种机器学习和统计模型预测个体患者临床风险并不一致

作者：小柯机器人发布时间：2020/11/8 22:27:12

本期文章：《英国医学杂志》：Online/在线发表

英国曼彻斯特大学Tjeerd Pieter van Staa团队研究了多种机器学习和统计模型预测个体患者临床风险的一致性。2020年11月4日，该研究发表在《英国医学杂志》上。

为了评估机器学习和统计技术在预测个体水平和群体水平心血管疾病风险方面的一致性，以及审查对风险预测的影响，1998年1月1日至2018年12月31日，研究组进行了一项纵向队列研究。

研究组使用在英格兰391种常规实践中注册的360万患者的数据，均有相关住院记录和死亡记录。模型性能包括在具有可比性的模型之间对相同患者的鉴别、校准和个体风险预测的一致性。研究组使用了19种不同的预测技术，包括12个机器学习模型，3个Cox比例风险模型，3个参数生存模型和1个逻辑模型。

各种模型具有相似的群体水平性能。但是，在不同类型的机器学习和统计模型之间以及组内，对心血管疾病个人风险的预测差异很大，尤其是在风险较高的患者中。QRISK3预测的风险为9.5-10.5％的患者在随机森林中的风险为2.9-9.2％，在神经网络中的风险为2.4-7.2％。

QRISK3和神经网络之间的预测风险差异在–23.2％和0.1％之间。忽略审查的模型大大低估了心血管疾病的风险。使用QRISK3心血管疾病风险高于7.5％的223815位患者中，有57.8％的患者在使用另一种模型时，心血管疾病风险低于7.5％。

研究结果表明，尽管模型性能相似，但各种模型对同一患者的风险预测却大不相同。在不考虑审查的情况下，不应将逻辑模型和常用的机器学习模型直接用于长期风险预测。

附：英文原文

Title: Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

Author: Yan Li, Matthew Sperrin, Darren M Ashcroft, Tjeerd Pieter van Staa

Issue&Volume: 2020/11/04

Abstract:

Objective To assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.

Design Longitudinal cohort study from 1 January 1998 to 31 December 2018.

Setting and participants 3.6 million patients from the Clinical Practice Research Datalink registered at 391 general practices in England with linked hospital admission and mortality records.

Main outcome measures Model performance including discrimination, calibration, and consistency of individual risk prediction for the same patients among models with comparable model performance. 19 different prediction techniques were applied, including 12 families of machine learning models (grid searched for best models), three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models, and one logistic model.

Results The various models had similar population level performance (C statistics of about 0.87 and similar calibration). However, the predictions for individual risks of cardiovascular disease varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks. A patient with a risk of 9.5-10.5% predicted by QRISK3 had a risk of 2.9-9.2% in a random forest and 2.4-7.2% in a neural network. The differences in predicted risks between QRISK3 and a neural network ranged between –23.2% and 0.1% (95% range). Models that ignored censoring (that is, assumed censored patients to be event free) substantially underestimated risk of cardiovascular disease. Of the 223815 patients with a cardiovascular disease risk above 7.5% with QRISK3, 57.8% would be reclassified below 7.5% when using another model.

Conclusions A variety of models predicted risks for the same patients very differently despite similar model performances. The logistic models and commonly used machine learning models should not be directly applied to the prediction of long term risks without considering censoring. Survival models that consider censoring and that are explainable, such as QRISK3, are preferable. The level of consistency within and between models should be routinely assessed before they are used for clinical decision making.

DOI: 10.1136/bmj.m3919

Source: https://www.bmj.com/content/371/bmj.m3919

期刊信息

BMJ-British Medical Journal:《英国医学杂志》，创刊于1840年。隶属于BMJ出版集团，最新IF：27.604
官方网址：http://www.bmj.com/
投稿链接：https://mc.manuscriptcentral.com/bmj