The Explore-Exploit Tradeoff and The Curse of Familiarity

by Suhas

The Explore Vs. Exploit tradeoff is everywhere you look.

You are given two coins. One is fair and the other has an unknown bias. You know which is which. You are to play a game with 100 tries. In each try, you must choose one of the coins and toss it. If it comes up heads, you get $1. What should be your strategy to maximize your winnings?

This is a well known problem known as the One-armed bandit [1]. If you knew which coin had a higher probability of coming up heads, you could simply stick with that coin to maximize our winnings (exploit). Instead, you need to flip the biased coin a few times in order to estimate its bias, even though this takes up valuable tries (explore). It is also clear that this exploration must be done early on in the game rather than later, otherwise early tries could be wasteful. The general — and more interesting — version of the problem has n coins, all with unknown biases, and is called the Multi-armed Bandit problem.

Here are some more examples of the explore-exploit tradeoff:

  1. How should Facebook decide which posts to display on your news feed? In order to explore your preferences, they sometimes need to show you posts that you may not like, but this will give them valuable information.
  2. Pricing a product requires some exploration to figure out the price point that yields highest profits.
  3. Careers and education. Education = Explore, Work = Exploit. Jobs become lost to technology over time and it necessary for workers to renew their skillset. In some fields the rate of change of technology is so rapid that workers find it hard to keep up.

I believe familiarity is simply a special case of a suboptimal point in the explore-exploit space — namely, not enough exploration. On a day to day basis, we make decisions based on what we are most familiar with. Here are some examples:

  1. As this recent article in Inc. explains, familiarity can make us less efficient — we stop questioning things because we get so used to seeing them. This is true not just at the workplace but also at home and in our relationships.
  2. Brand loyalty is basically sticking to what ‘works’ and what one is familiar with, often in the presence of better alternatives.

Why do we not explore enough, even though it may be the rational thing to do? Perhaps it is because exploiting is fast, whereas exploring is slow. Perhaps it is because we follow the simplistic Law of Effect. Or perhaps its because we just aren’t perfectly rational at all, as the prospect theory of Kahneman and Tversky suggests, valuing the deterministic gain (of say, X) associated with exploit, more than the possibility of a greater gain from explore (of say  > aX with probability \frac{1}{a}).

However, what is even more curious, we often decide that which is not familiar is somehow ‘bad’ or inferior. For e.g:

  1. It is difficult for a lot of people to accept and appreciate other religions.
  2. The same is true for understanding cultures from half-way around the world, such as how one eats their food.

I have a theory for why humans look down upon the unfamiliar. In a word, I think its because we are lazy. Accepting a completely unfamiliar observation requires making space in our minds, by readjusting things that we are already familiar with, and including the new observation into our mental model of the world. This model evolves slowly, requiring significant energy and effort on our part. We take in vast amounts of information, and form a much more compact, compressed representation of the world around us. An unfamiliar object is something not well explained by our compact model of the world. Therefore, we face the choice of either:

  1. Dismissing the new observation, or
  2. Re-adjusting our mental framework to incorporate this new object.

The former requires almost no energy, while the latter requires significant energy. Optimizing for the short term makes us pick the former. In order to justify the choice we declare the object to be strange, or inferior.

Here are a couple of lessons I draw from this:

  1. Vacations are important, to very unfamiliar places, if possible.
  2. Since we begin forming our mental model from an early age, we should strive to expose children to as diverse a set of experiences as possible. Just like the coin problem above requires exploration early in the game. Thanks to my parents, I was fortunate to have lived in 3 countries and travelled in 4 continents by the time I was 20.


  1. I haven’t found an optimal solution to this problem in the form stated above. If you have a provably optimal solution, please email me.