NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland Lodge

"Bayes, MDL, SRM and \sqrt{n} in Classification"

Peter Grunwald (CWI, Amsterdam)


We consider different methods for dealing with model complexity in classification. Different approaches fall into two main categories: (A) methods directly based on provable bounds on generalization error. These include Vapnik's SRM, McAllester's PAC-Bayes methods and Yamanishi's `generalization' of MDL for 0/1-loss; (B) methods based on minimizing something interpretable as a codelength of the data (labels). These include various forms of `ordinary' MDL and `ordinary' Bayes. Both categories take into account `complexities' of hypotheses or sets of hypotheses. However, in several typical cases, the effective `complexity' of a hypothesis H as defined in approaches A is about \sqrt(n) times as large than its complexity as defined in approaches B. Since approaches A are explicitly based on minimizing generalization error, does this imply that approaches B are suboptimal and/or may consequently overfit? We address this question in detail. The answer turns out to be quite surprising irrespective of whether one takes the point of view of either approach A or B. One of the consequences is a new and strong loss bound for a Bayesian-style prediction algorithm.