NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland Lodge

"Combining Holdout and Training Set Based Procedures for Sample Complexity Bounds"

John Langford


There are two common approaches for bounding the true error rate of a learned classifier: the holdout set technique which is used in practice and training set based techniques which are often analyzed theoretically. The reason that holdout set techniques are used despite all the theoretical work on training set based techniques is that training set based techniques are not guaranteed to give tight true error bounds in comparison to holdout set techniques. Despite this, the training set based techniques are interesting because they can _sometimes_ report a true error bound which is tighter than what is derivable from a holdout set technique. I will discuss an approach using both sources of information for the true error bound which has the following two properties:
1. The resulting true error bound is never much worse than the best of the holdout and training set techniques. 2. The resulting true error bound is sometimes better than the best of both individual bounds. These two properties together provide a technique by which the large theoretical literature on training set based bounds can be immediately applied in practice so as to benefit from the best of both approaches.