Multi-timescale reinforcement learning in the brain

成果类型:
Article
署名作者:
Masset, Paul; Tano, Pablo; Kim, HyungGoo R.; Malik, Athar N.; Pouget, Alexandre; Uchida, Naoshige
署名单位:
Harvard University; Harvard University; McGill University; Mila Quebec Artificial Intelligence Institute; University of Geneva; Sungkyunkwan University (SKKU); Institute for Basic Science - Korea (IBS); Brown University; Lifespan Health Rhode Island; Rhode Island Hospital; Harvard University
刊物名称:
Nature
ISSN/ISSBN:
0028-2987
DOI:
10.1038/s41586-025-08929-9
发表日期:
2025-06-19
关键词:
reward prediction dopamine signals time CODE
摘要:
To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behaviour can be learned through reinforcement learning1, a class of algorithms that has been successful at training artificial agents2, 3, 4-5 and at characterizing the firing of dopaminergic neurons in the midbrain6, 7-8. In classical reinforcement learning, agents discount future rewards exponentially according to a single timescale, known as the discount factor. Here we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopaminergic neurons in mice performing two behavioural tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks, suggesting that it is a cell-specific property. Together, our results provide a new paradigm for understanding functional heterogeneity in dopaminergic neurons and a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations9, 10, 11-12, and open new avenues for the design of more-efficient reinforcement learning algorithms.