NeuroCOLT
workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland
Lodge
"PAC-Bayesian Bounds for Kernel Classifiers based on Compression and Margin"
Thore Graepel
The PAC-Bayesian framework is a candidate framework for generalisation
error bounds below one because it provides aposteriori bounds that depend
on the training sample observed. The resulting bounds are low if the prior
belief as expressed by a prior distribution over hypothesis space is
supported by the data.
Two basic quantities have been argued to have an influence on the
generalisation performance of classifiers: compression ratio and margin.
The compression ratio is inherently related to example-based
classification algorithms that make use of (a subset of) the training
sample directly for the classification of test examples such as, for
example, the k-nearest-neighbours classifier. The margin characterises
hypothesis-based classifiers that derive the classification from
thresholding a real-valued function. It turns out that kernel classifiers
share important properties of both example-based and hypothesis-based
classification algorithms and can thus be characterised by compression
ratio and margin.
In this talk I give an introduction to the PAC-Bayesian framework and show
how it can be used to derive bounds based on both compression ratio and
margin for kernel classifiers. For the margin bound a PAC-Bayesian prior
is defined over weight space, and the margin appears as a measure of the
fraction of hypotheses with empirical risk below a certain threshold. In
the case of compression the prior is defined over the space of expansion
coefficients of the weight vector in terms of training inputs. Empirical
results indicate that the bounds provide values below one for
realistic sample size and may be used for model selection. Not surprisingly,
however, they are unable to predict the generalisation error accurately
for real-world examples.