美国哥伦比亚大学Vikram Gadagkar小组揭示了自然行为是通过多巴胺介导的强化习得的。该项研究成果发表在2025年3月12日出版的《自然》上。
他们之前对成年斑胸草雀的研究表明,在与歌唱相关的基底神经节X区,多巴胺编码了表演预测误差:在比预期差(扭曲音节)的表演后,多巴胺被抑制,在比预期好(未扭曲音节)的表演后,多巴胺被激活。
然而,自然行为的学习,如发育性声乐学习,是否通过基于多巴胺的强化发生,尚不清楚。
在这里,该课题组研究人员跟踪了幼斑胸草雀的鸣叫学习轨迹,并使用主题纤维光度法3来监测X区多巴胺的同步活动。该课题组人员发现,与最近的版本相比,在更接近最终成人版本的音节演唱后,多巴胺被激活,而在更远的版本演唱后,多巴胺被抑制。
此外,多巴胺与歌曲波动之间的关系揭示了多巴胺预测了歌曲的未来进化,表明多巴胺驱动行为。最后,多巴胺的活性可以通过当前表演质量和近期表演历史之间的对比来解释,这与演员-评论家强化学习模型中多巴胺在编码预测误差中的假设作用相一致。强化学习算法已经成为一类强大的模型,用于解释基于奖励的实验室任务中的学习,以及驱动人工智能中的自主学习。他们的研究结果表明,生物系统中复杂的自然行为也可以通过多巴胺介导的强化学习获得。
研究人员表示,许多自然的运动技能,如说话或移动,都是通过在发展过程中不断尝试和错误学习的过程获得的。长期以来,在人工学习实验观察的推动下,人们一直假设多巴胺在这一过程中起着至关重要的作用。基底神经节中的多巴胺被认为通过编码奖励预测错误来指导基于奖励的试错学习,在奖励结果比预期差时减少,在奖励结果比预期好时增加。
附:英文原文
Title: Natural behaviour is learned through dopamine-mediated reinforcement
Author: Kasdin, Jonathan, Duffy, Alison, Nadler, Nathan, Raha, Arnav, Fairhall, Adrienne L., Stachenfeld, Kimberly L., Gadagkar, Vikram
Issue&Volume: 2025-03-12
Abstract: Many natural motor skills, such as speaking or locomotion, are acquired through a process of trial-and-error learning over the course of development. It has long been hypothesized, motivated by observations in artificial learning experiments, that dopamine has a crucial role in this process. Dopamine in the basal ganglia is thought to guide reward-based trial-and-error learning by encoding reward prediction errors1, decreasing after worse-than-predicted reward outcomes and increasing after better-than-predicted ones. Our previous work in adult zebra finches—in which we changed the perceived song quality with distorted auditory feedback—showed that dopamine in Area X, the singing-related basal ganglia, encodes performance prediction error: dopamine is suppressed after worse-than-predicted (distorted syllables) and activated after better-than-predicted (undistorted syllables) performance2. However, it remains unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Here we tracked song learning trajectories in juvenile zebra finches and used fibre photometry3 to monitor concurrent dopamine activity in Area X. We found that dopamine was activated after syllable renditions that were closer to the eventual adult version of the song, compared with recent renditions, and suppressed after renditions that were further away. Furthermore, the relationship between dopamine and song fluctuations revealed that dopamine predicted the future evolution of song, suggesting that dopamine drives behaviour. Finally, dopamine activity was explained by the contrast between the quality of the current rendition and the recent history of renditions—consistent with dopamine’s hypothesized role in encoding prediction errors in an actor–critic reinforcement-learning model4,5. Reinforcement-learning algorithms6 have emerged as a powerful class of model to explain learning in reward-based laboratory tasks, as well as for driving autonomous learning in artificial intelligence7. Our results suggest that complex natural behaviours in biological systems can also be acquired through dopamine-mediated reinforcement learning.
DOI: 10.1038/s41586-025-08729-1
Source: https://www.nature.com/articles/s41586-025-08729-1
Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html