当前位置:科学网首页 > 小柯机器人 >详情
奖励预测误差神经元调控奖励行为
作者:小柯机器人 发布时间:2024/6/23 16:09:45

美国纽约大学Heiko H. Schütt小组取得一项新突破。他们发现奖励预测误差神经元实现高效的奖励编码。相关论文发表在2024年6月19日出版的《自然-神经科学》杂志上。

研究人员借鉴了感觉神经科学中的高效编码原理,发现了参与奖赏分布调控的最优神经群。研究表明,小鼠和猕猴多巴胺能奖赏预测误差神经元的反应在以下方面与高效编码相似:神经元的中点分布广泛,覆盖奖赏分布;阈值越高的神经元收益越高,调和函数越凸,斜率越低;当奖赏分布较窄时,神经元的斜率越高。

此外,研究还推导出了收敛到高效代码的学习规则。神经元在奖励中的学习规则与分布强化学习非常相似。因此,奖励预测误差神经元的反应可以通过优化高效奖励信号,并建立了高效编码和强化学习这两个计算神经科学领域最成功理论之间的联系。

附:英文原文

Title: Reward prediction error neurons implement an efficient code for reward

Author: Schtt, Heiko H., Kim, Dongjae, Ma, Wei Ji

Issue&Volume: 2024-06-19

Abstract: We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.

DOI: 10.1038/s41593-024-01671-x

Source: https://www.nature.com/articles/s41593-024-01671-x

期刊信息

Nature Neuroscience:《自然—神经科学》,创刊于1998年。隶属于施普林格·自然出版集团,最新IF:28.771
官方网址:https://www.nature.com/neuro/
投稿链接:https://mts-nn.nature.com/cgi-bin/main.plex