Signatures and consequences of Distributional Reinforcement Learning

Financiador

Organização

Publicações

Signatures and Consequences of Distributional Reinforcement Learning

Publication . Sousa, Margarida; Paton, Joe; MCNamee, Daniel

"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"

2024-11-29Tese de doutoramento

Acesso aberto

Ver mais

Entidade financiadora

Fundação para a Ciência e a Tecnologia

Número da atribuição

PD/BD/141552/2018