NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland Lodge

"On-line Confidence Machines are well-calibrated "

Volodya Vovk


Transductive Confidence Machine (TCM) and its computationally efficient modification, Inductive Confidence Machine (ICM), are ways of complementing machine-learning algorithms with practically useful measures of confidence. This talk will be mainly concerned with the performance of TCM and ICM when used in the on-line mode: at each trial n=1,2,..., given the n-th object and the previous n-1 labelled objects, they are required to output a predictive region containing the n-th object's label with probability equal to or exceeding some a priori chosen confidence level 1-delta. It will be shown that when TCM and ICM are used in this mode, they are well-calibrated, in the sense that they will be wrong with relative frequency at most delta (approaching delta in the case of randomised TCM and ICM) in the long run. This is not just an asymptotic phenomenon: actually, the error probability of randomised TCM and ICM is delta at every trial and, crucially, errors happen independently at different trials. Empirical performance of the TCM based on the Nearest Neighbours algorithm on the standard USPS data set (but randomly permuted, to make sure that the digits and their classifications are independent and identically distributed) will also be reported. In contrast to the standard PAC bounds on the error probability, which typically exceed 1, this TCM is able to predict 90% of the 9298 digits with confidence 99% (i.e., for 90% of digits the predictive region contains only one label at the confidence level 99%); in the second half of the data set (4649 digits), 95% of the digits are predicted with confidence 99%.