NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

1994 1995
1996 1997
1998 1999
2000 2001
2002

Books

info@neurocolt.org

NeuroCOLT Technical Report NC-TR-02-130


2002-130
Using Confidence Bounds for Exploitation - Exploration Trade offs
Peter Auer


ABSTRACT
We show how a standard tool from statistics - namely confidence bounds can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied
when an algorithm has to make exploitation-versus-exploration
decisions based on uncertain information provided by a random process.

We apply our technique to two models with such an
exploitation-exploration trade-off. For the adversarial bandit
problem with shifting our new algorithm suffers only (ST)^(1/2) regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from T^(3/4) to T^(1/2).


Download Postscript