德国弗里德里希·席勒耶拿大学Kevin Maik Jablonka团队近日根据化学家的专业知识评估了大型语言模型的化学知识和推理能力的框架。该项研究成果发表在2025年5月20日出版的《自然-化学》杂志上。
大型语言模型(LLM)因其处理人类语言和执行未经明确训练的任务的能力而受到广泛关注。然而,人们对LLM的化学能力只有有限的系统了解,这将需要改进模型和减轻潜在的危害。研究组介绍ChemBench,这是一个自动化框架,用于根据化学家的专业知识评估最先进的LLM的化学知识和推理能力。他们策划了2700多个问答对,评估了领先的开源和闭源LLM,发现在该研究中,平均而言,最好的模型表现优于最好的人类化学家。
然而,这些模型难以完成一些基本任务,并提供了过于自信的预测。这些发现揭示了LLM令人印象深刻的化学能力,同时强调了进一步研究以提高其安全性和实用性的必要性。研究组还建议调整化学教育,并展示了在特定领域评估LLM的基准框架的价值。
附:英文原文
Title: A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists
Author: Mirza, Adrian, Alampara, Nawaf, Kunchapu, Sreekanth, Ros-Garca, Martio, Emoekabu, Benedict, Krishnan, Aswanth, Gupta, Tanya, Schilling-Wilhelmi, Mara, Okereke, Macjonathan, Aneesh, Anagha, Asgari, Mehrdad, Eberhardt, Juliane, Elahi, Amir Mohammad, Elbeheiry, Hani M., Gil, Mara Victoria, Glaubitz, Christina, Greiner, Maximilian, Holick, Caroline T., Hoffmann, Tim, Ibrahim, Abdelrahman, Klepsch, Lea C., Kster, Yannik, Kreth, Fabian Alexander, Meyer, Jakob, Miret, Santiago, Peschel, Jan Matthias, Ringleb, Michael, Roesner, Nicole C., Schreiber, Johanna, Schubert, Ulrich S., Stafast, Leanne M., Wonanke, A. D. Dinga, Pieler, Michael, Schwaller, Philippe, Jablonka, Kevin Maik
Issue&Volume: 2025-05-20
Abstract: Large language models (LLMs) have gained widespread interest owing to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question–answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However, the models struggle with some basic tasks and provide overconfident predictions. These findings reveal LLMs’ impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.
DOI: 10.1038/s41557-025-01815-x
Source: https://www.nature.com/articles/s41557-025-01815-x
Nature Chemistry:《自然—化学》,创刊于2009年。隶属于施普林格·自然出版集团,最新IF:24.274
官方网址:https://www.nature.com/nchem/
投稿链接:https://mts-nchem.nature.com/cgi-bin/main.plex