|
NeuroCOLT
Technical Report NC-TR-97-011
Randomized
Hypotheses and Minimum Disagreement Hypotheses
Nicolò
Cesa-Bianchi
Università degli Studi di Milan, Italy
Paul
Fischer
Universität Dortmund, Germany
Eli
Shamir
Hebrew University, Israel
Hans
Ulrich Simon
Universität Dortmund, Germany
Abstract
In this paper we prove various results about PAC learning in
the presence of malicious and random classification noise. Our main
theme is the use of randomized hypotheses for learning with small
sample sizes and high malicious noise rates. We show an algorithm
that PAC learns any target class of VC-dimension $d$ using randomized
hypotheses and order of $d/\ve$ training examples (up to logarithmic
factors) while tolerating malicious noise rates even slightly larger
than the information-theoretic bound $\ve/(1+\ve)$ for deterministic
hypotheses. Combined with previous results, this implies that a lower
bound $d/\Delta + \ve/\Delta^2$ on the sample size, where $\eta =
\ve/(1+\ve)-\Delta$ is the malicious noise rate, applies only when
using deterministic hypotheses. We then show that the information-theoretic
upper bound on the noise rate for deterministic hypotheses can be
replaced by $2\ve/(1+2\ve)$ if randomized hypotheses are used. Investigating
further the use of randomized hypotheses, we show a strategy for learning
the powerset of $d$ elements using an optimal sample size of order
$d\ve/\Delta^2$ (up to logarithmic factors) and tolerating a noise
rate $\eta = 2\ve/(1+2\ve)-\Delta$. We complement this result by proving
that this sample size is also necessary for any class $\cC$ of VC-dimension
$d$. We then discuss the performance of the minimum disagreement strategy
under both malicious and random classification noise models. For malicious
noise we show an algorithm that, using deterministic hypotheses, learns
unions of $d$ intervals on the continuous domain $[0,1)$ using a sample
size significantly smaller than that needed by the minimum disagreement
strategy. For classification noise we show, generalizing a result
by Laird, that order of $d/(\ve\Delta^2)$ training examples suffice
(up to logarithmic factors) to learn by minimizing disagreements any
target class of VC-dimension $d$ tolerating random classification
noise rate $\eta = 1/2 - \Delta$. Using a lower bound by Simon, we
also prove that this sample size bound cannot be significantly improved.
Download Compressed Postscript
|