Curious by choice or by chance? Computational noise in learning drives decision variability in volatile environments


May 15, 2019 - 12:00pm
Northwest 243
About the Speaker
Valentin Wyart
Speaker Affiliation: 
Ecole Normale Supérieure, PSL University

When learning the value of actions in volatile environments, humans make a sizable fraction of 'non-greedy' decisions which do not maximize expected value. Prominent theories describe these decisions as the result of a compromise between choosing a currently well-valued action vs. exploring more uncertain, possibly better-valued actions - known as the 'exploration-exploitation' trade-off. However, we have recently shown that the variability of perceptual decisions based on multiple cues is bounded not by sensory errors nor by choice stochasticity, but by computational noise in probabilistic inference.

We thus reasoned that a substantial fraction of non-greedy decisions may be caused by the same kind of noise during reward-guided learning.

We derived a theoretical formulation of reinforcement learning (RL) which allows for random noise in its core computations. In a series of behavioral, neuroimaging, pupillometric and pharmacological experiments, we quantified the fraction of non-greedy decisions driven by learning noise and identified its neurophysiological substrates. At the behavioral level, we show that more than half of non-greedy decisions are triggered by learning noise alone. At the neurophysiological level, the trial-to-trial variability of learning steps and its impact on behavior could be: 1. predicted by BOLD responses in the dorsal anterior cingulate cortex (dACC) and phasic pupillary dilation, and 2. increased by pharmacological manipulation of the locus coeruleus-norepinephrine (LC-NE) system. Together, these findings delineate an important, yet previously unsuspected role for the noradrenergic system in the precision of learning in volatile environments.