|
NeuroCOLT |
Neural Networks and Computational Learning Theory |
||||||||
|
|
NeuroCOLT Technical Report NC-TR-96-021 T2 - Computing optimal 2-level decision tree Peter
Auer Note: This is a C program available in tarred (compressed) format. Description P.
Auer, R.C. Holte, and W. Maass. Please see the paper for a description of the algorithm and a discussion of the results. (Note: There is a typo in the paper in Table 2: The Sky2 value for HE is 89.0% instead of 91.0%.) T2 calculates optimal decision trees up to depth 2. T2 accepts exactly the same input as C4.5, consisting of a name-file, a data-file, and an optional test-file. The output of TREE2 is a decision tree similar to the decision trees of C4.5, but there are some differences. T2 uses two kinds of decision nodes: (1) discrete splits on an discrete attribute where the node has as many branches as there are possible attribute values, and (2) interval splits of continuous attributes. A node which performs an interval split divides the real line into intervals and has as many branches as there are intervals. The number of intervals is restricted to be (a) at most MAXINTERVALS if all the branches of the decision node lead to leaves, and to be (b) at most 2 otherwise. MAXINTERVALS can be set by the user. The attribute value ``unknown'' is treated as a special attribute value. Each decision node (discrete or continuous) has an additional branch which takes care of unknown attribute values. T2 builds the decision tree satisfying the above constraints and minimizing the number of misclassifications of cases in the data-file. |