NeuroCOLT
workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland
Lodge
"Combining Holdout and Training Set Based Procedures for Sample
Complexity Bounds"
John Langford
There are two common approaches for bounding the true error rate of a
learned classifier: the holdout set technique which is used in
practice and training set based techniques which are often analyzed
theoretically. The reason that holdout set techniques are used
despite all the theoretical work on training set based techniques is
that training set based techniques are not guaranteed to give tight
true error bounds in comparison to holdout set techniques. Despite
this, the training set based techniques are interesting because they
can _sometimes_ report a true error bound which is tighter than what
is derivable from a holdout set technique. I will discuss an approach
using both sources of information for the true error bound which has
the following two properties:
1. The resulting true error bound is never much worse than the best
of the holdout and training set techniques.
2. The resulting true error bound is sometimes better than the best of
both individual bounds.
These two properties together provide a technique by which the large
theoretical literature on training set based bounds can be immediately applied in practice so as to benefit from the best of both approaches.