Signatures and Consequences of  Distributional Reinforcement Learning

Sousa, Margarida

http://hdl.handle.net/10362/182540

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
INDP_MargaridaSousa_November2024.pdf		24.57 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Sousa, Margarida

Orientador(es)

Paton, Joe

MCNamee, Daniel

Resumo(s)

"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"