NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

1994 1995
1996 1997
1998 1999
2000 2001

Books

info@neurocolt.org

NeuroCOLT Technical Report NC-TR-96-021

T2 - Computing optimal 2-level decision tree

Peter Auer
Institute forTheoretical Computer Science
Technische Universitaet Graz
Austria

Note: This is a C program available in tarred (compressed) format.

Description
This is a short description of the T2 program discussed in:

P. Auer, R.C. Holte, and W. Maass.
Theory and applications of agnostic PAC-learning with small decision trees.
In Proc. 7th Int. Machine Learning Conf.,
Tahoe City (USA), 1995.

Please see the paper for a description of the algorithm and a discussion of the results. (Note: There is a typo in the paper in Table 2: The Sky2 value for HE is 89.0% instead of 91.0%.)

T2 calculates optimal decision trees up to depth 2. T2 accepts exactly the same input as C4.5, consisting of a name-file, a data-file, and an optional test-file. The output of TREE2 is a decision tree similar to the decision trees of C4.5, but there are some differences.  T2 uses two kinds of decision nodes: (1) discrete splits on an discrete attribute where the node has as many branches as there are possible attribute values, and (2) interval splits of continuous attributes. A node which performs an interval split divides the real line into intervals and has as many branches as there are intervals. The number of intervals is restricted to be (a) at most MAXINTERVALS if all the branches of the decision node lead to leaves, and to be (b) at most 2 otherwise. MAXINTERVALS can be set by the user. The attribute value ``unknown'' is treated as a special attribute value. Each decision node (discrete or continuous) has an additional branch which takes care of unknown attribute values.  T2 builds the decision tree satisfying the above constraints and minimizing the number of misclassifications of cases in the data-file.

Download Compressed Program