NeuroCOLT
workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland
Lodge
"Bayes, MDL, SRM and \sqrt{n} in Classification"
Peter Grunwald (CWI, Amsterdam)
We consider different methods for dealing with model complexity in
classification. Different approaches fall into two main categories:
(A) methods directly based on provable bounds on generalization error.
These include Vapnik's SRM, McAllester's PAC-Bayes methods and
Yamanishi's `generalization' of MDL for 0/1-loss; (B) methods based on
minimizing something interpretable as a codelength of the data (labels).
These
include various forms of `ordinary' MDL and `ordinary' Bayes. Both
categories take into account `complexities' of hypotheses or sets of
hypotheses. However, in several typical cases, the effective
`complexity' of a hypothesis H as defined in approaches A is about
\sqrt(n) times as large than its complexity as defined in approaches B.
Since approaches A are explicitly based on minimizing generalization
error, does this imply that approaches B are suboptimal and/or may
consequently overfit? We address this question in detail. The answer
turns out to be quite surprising irrespective of whether one takes the
point of view of either approach A or B. One of the consequences is a
new and strong loss bound for a Bayesian-style prediction algorithm.