|
NeuroCOLT
Technical Report NC-TR-96-034
Theory and Applications
of Agnostic PAC-Learning with Small Decision Trees
Peter
Auer
University of California Santa Cruz
USA
Robert
C. Holte
University of Ottawa
Canada
Wolfgang
Maass
Technische Universitaet Graz
Austria
Abstract
We exhibit a theoretically founded algorithm $\Tii$ for agnostic PAC-learning
of decision trees of at most 2 levels, whose computation time is almost
linear in the size of the training set. We evaluate the performance
of this learning algorithm $\Tii$ on 15 common ``real-world'' datasets,
and show that for most of these datasets $\Tii$ provides simple decision
trees with little or no loss in predictive power (compared with C4.5).
In fact, for datasets with continuous attributes its error rate tends
to be lower than that of C4.5. To the best of our knowledge this is
the first time that a PAC-learning algorithm is shown to be applicable
to ``real-world'' classification problems. Since one can prove
that $\Tii$ is an agnostic PAC-learning algorithm, $\Tii$ is
guaranteed to produce close to optimal 2-level decision
trees from sufficiently large training sets forany (!) distribution
of data. In this regard $\Tii$ differs strongly from all other learning
algorithms that are considered in applied machine learning, for which
no guarantee can be given about their performance on
new datasets. We also demonstrate that this
algorithm $\Tii$ can be used as a diagnostic tool for the investigation
of the expressive limits of 2-level decision trees. Finally, T2, in
combination with new bounds on the VC-dimension of decision trees
of bounded depth that we derive, provides us now for the first time
with the tools necessary for comparing learning curves of decision
trees for ``real-world'' datasets with the theoretical estimates of
PAC-learning theory.
Download Compressed
Postscript
|