This page is inactive since the closure of Microsoft Research Silicon Valley in September 2014.
This is an umbrella project for several related efforts at Microsoft Research Silicon Valley that address various Multi-Armed Bandit (MAB) formulations motivated by web search and ad placement. The MAB problem is a classical paradigm in Machine Learning in which an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies.
The name "multi-armed bandits" comes from a whimsical scenario in which a gambler faces several slot machines, a.k.a. "one-armed bandits", that look identical at first but produce different expected winnings. The crucial issue here is the trade-off between acquiring new information (exploration) and capitalizing on the information available so far (exploitation). While the MAB problems have been studied extensively in Machine Learning, Operations Research and Economics, many exciting questions are open. One aspect that we are particularly interested in concerns modeling and efficiently using various types of side information that may be available to the algorithm.
Contact: Alex Slivkins.
Yogi Sharma (Cornell —> Facebook; intern at MSR-SV in summer 2008)
Umar Syed (Princeton —> Google; intern at MSR-SV in summer 2008)
Shaddin Dughmi (Stanford —>USC; intern at MSR-SV in summer 2010)
Ashwinkumar Badanidiyuru (Cornell --> Google; intern at MSR-SV in summer 2012)
The best of both worlds: stochastic and
Sebastien Bubeck and Alex Slivkins (COLT 2012)
Abstract We present a new bandit algorithm whose regret is optimal both for adversarial rewards and for stochastic rewards, achieving, resp., square-root regret and polylog regret.