Reinforcement learning in continuous multi-dimensional spaces: Gradient ascent and the exploration-exploitation tradeoff


April 16, 2014 - 1:00pm
NW 243
About the Speaker
Yohsuke Miyamoto (Smith Lab)

The algorithms that humans use for reward-based learning are largely unknown. Studies have investigated how humans solve categorical (e.g. n-arm bandit tasks) or single-dimensional (e.g. aiming direction) reinforcement learning. Yet, little is known about how humans solve reinforcement learning in the context of the continuous multi-dimensional task spaces that characterize many ecological reinforcement problems, where an exhaustive exploration of the task space is impossible. Here we investigate how humans solve reinforcement learning in a continuous 2-dimensional task, using a design that allows us to carefully isolate exploration from exploitation to analyze how they interact during learning.  Our results allow for a careful trial by trial analysis which reveals that humans navigate continuous, multi-dimensional reinforcement learning via a gradient-ascent like exploitation strategy, in which the tradeoff between exploration and exploitation is modulated by the size of the reward gradient. Interestingly, we find that the exploration of novel dimensions in task space, in lieu of more fully exploiting current knowledge to more rapidly increase performance, readily improves the ability to accurately determine the direction of the reward gradient, and simultaneously explains both inter-individual differences in gradient following accuracy and block-to-block differences within individual subjects.