|
About
NeuroCOLT
Papers
Archive
Books
info@neurocolt.org
|
NeuroCOLT
Technical Report NC-TR-98-025
Gambling
in a rigged casino: The adversarial multi-armed bandit problem
Peter Auer
University of Technology Graz
Nicoň Cesa-Bianchi
Universit`a di Milano
Yoav Freund & Robert E. Schapire
AT&T Labs
Florham Park
NJ
Keywords:
Received:
17-AUG-98
Abstract
In the multi-armed bandit problem, a gambler must decide which arm
of K non-identical slot machines to play in a sequence of trials so
as to maximize his reward. This classical problem has received much
attention because of the simple model it provides of the trade-off
between exploration (trying out each arm to find the best one) and
exploitation (playing the arm believed to give the best payoff). Past
solutions for the bandit problem have almost always relied on assumptions
about the statistics of the slot machines. In this work, we
make no statistical assumptions whatsoever about the nature of the
process generating the payoffs of the slot machines. We give a solution
to the bandit problem in which an adversary, rather than a well-behaved
stochastic process, has complete control over the payoffs.
In a sequence of $T$ plays, we prove that the expected per-round payoff
of our algorithm approaches that of the best arm at the rate %PA changed
from T^{-1/3} $O(T^{-1/2})$, and we give an improved rate of convergence
when the best arm has fairly low payoff. We also prove a general matching
lower bound on the best possible performance of any algorithm in our
setting. In addition, we consider a setting in which the player has
a team of 'experts' advising him on which arm to play; here, we give
a strategy that will guarantee expected payoff close to that of the
best expert. Finally, we apply our result to the problem of learning
to play an unknown repeated matrix game against an all-powerful adversary.
Download Compressed Postscript
|