| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 24.57 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
"Learning to predict rewards is fundamental for adaptive behavior. Midbrain
dopamine neurons (DANs) play a key role in such learning by signaling
reward prediction errors (RPEs) that teach recipient circuits about expected
rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference
(TD) reinforcement learning (RL), learns the mean of temporally discounted
expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present
time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over
time and magnitude using an efficient code that adapts to environmental
statistics. In addition, we discovered signatures of TMRL-like computations
in the activity of optogenetically identified DANs in mice during behavior.
Specifically, we found significant diversity in both temporal discounting and
tuning for the magnitude of rewards across DANs, features that allow the
computation of a two dimensional, probabilistic map of future rewards from
just 450ms of neural activity recorded from a population of DANs in response
to a reward-predictive cue. Furthermore, reward time predictions derived
from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding
when to act.(...)"
Descrição
Palavras-chave
Reinforcement learning dopamine basal ganglia timing decision-making
