The explore-exploit dilemma in human reinforcement learning


February 11, 2014 - 12:00pm
William James Hall 765
About the Speaker
Robert Wilson (Princeton)

When you go to your favorite restaurant, do you always get the same thing, or do you try something new? Sticking with an old favorite ensures a good meal, but exploring other options might yield something better - or something worse. This simple conundrum, choosing between what you know and what you don't, is called the exploration-exploitation dilemma. Whether it's deciding on a meal, a vacation destination or a life partner, this is an important problem for humans and animals to solve.


In this talk I will discuss how humans solve the explore-exploit dilemma.  Theory suggests two distinct strategies: a directed strategy, in which choices are biased toward information, and a random strategy, in which exploration is driven by noise. Here I will show that humans use both approaches, and that furthermore, the mixture of random and directed exploration is optimal in that it maximizes reward in the long run.  These results have implications for our understanding of how decisions impact learning, the role of exploration in development and mental disorders, and even for choosing what to eat for dinner.