|
NeuroCOLT
Technical Report NC-TR-98-018
Practical
Algorithms for On-line Sampling
Carlos Domingo and Ricard Gavald
UPC
Barcelona
Osamu Watanabe
Tokyo Institute of Technology
Keywords:
On-line; knowledge discovery; data mining
Received:
12-JUN-98
Abstract
One of the core applications of machine learning to
knowledge discovery consists on building a function (a hypothesis)
from a given amount of data (for instance a decision tree or a neural
network) such that we can use it afterwards to predict new instances
of the data.
In this paper, we focus on a particular situation where we assume
that the hypothesis we want to use for prediction is very simple,
and thus, the hypotheses class is of feasible size. We study the problem
of how to determine which of the hypotheses in the class is almost
the best one. We present two on-line sampling algorithms for
selecting hypotheses, give theoretical bounds on the number of necessary
examples, and analize them exprimentally. We compare them with the
simple batch sampling approach commonly used and show that in most
of the situations our algorithms use much fewer number of examples.
Download Compressed Postscript
|