Logo do repositório
 
A carregar...
Miniatura
Publicação

Signatures and Consequences of Distributional Reinforcement Learning

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
INDP_MargaridaSousa_November2024.pdf24.57 MBAdobe PDF Ver/Abrir

Resumo(s)

"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"

Descrição

Palavras-chave

Reinforcement learning dopamine basal ganglia timing decision-making

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo

Editora

Licença CC