|
NeuroCOLT
Technical Report NC-TR-96-037
Exponentiated
Gradient Versus Gradient Descent for Linear Predictors
Jyrki
Kivinen
University of Helsinki
Finland
Manfred
K. Warmuth
University of California Santa Cruz
USA
Abstract
We consider two algorithm for on-line prediction based on a linear
model. The algorithms are the well-known gradient descent ($\GD$)
algorithm and a new algorithm, which we call $\EGpm$. They both maintain
a weight vector using simple updates. For the $\GD$ algorithm, the
update is based on subtracting the gradient of the squared error made
on a prediction. The $\EGpm$ algorithm uses the components of the
gradient in the exponents of factors that are used in updating the
weight vector multiplicatively. We present worst-case loss bounds
for $\EGpm$ and compare them to previously known bounds for the $\GD$
algorithm. The bounds suggest that the losses of the algorithms are
in general incomparable, but $\EGpm$ has a much smaller loss if only
few components of the input are relevant for the predictions. We have
performed experiments, which show that our worst-case upper bounds
are quite tight already on simple artificial data.
Download Compressed
Postscript
|