NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

1994 1995
1996 1997
1998 1999
2000 2001

Books

info@neurocolt.org

NeuroCOLT Technical Report NC-TR-95-001

Worst-Case Analysis of the Bandit Problem

Peter Auer
Technische Universiät Graz
Klosterwiesgasse 32/2
A-8010 Graz, Austria.

Nicolò Cesa-Bianchi
DSI, University of Milan
Via Comelico 39,
I-20135 Milano, Italy.

Abstract
The multi-armed bandit is a classical problem in the area of sequential decision theory and has been studied under a variety of statistical assumptions. In this work we investigate the bandit problem from a purely worst-case standpoint. We present a randomized algorithm with an expected total reward of $G-O(G^{4/5}K^{6/5})$ (disregarding log factors), where $K$ is the number of arms and $G$ is the (unknown) total reward of the best arm. Our analysis holds with no assumptions whatsoever on the way rewards are generated, other than being independent of the algorithm's randomization. Our results can also be interpreted as a novel extension of the on-line prediction model, an intensively studied framework in learning theory.

Download Compressed Postscript