|
NeuroCOLT
Technical Report NC-TR-99-054
Finding
Relevant Variables in PAC Model with Membership Queries
Guijarro,
Tarui & Tsukiji
Abstract
A
new research frontier in AI and data mining seeks to develop methods
to automatically discover relevant variables among many irrelevant
ones. In this paper, we present four algorithms that output such crucial
variables in PAC model with membership queries. The first algorithm
executes the task under any unknown distribution by measuring the
distance between virtual and real targets. The second algorithm exhausts
virtual version space under an arbitrary distribution. The third algorithm
exhausts universal set under the uniform distribution. The fourth
algorithm measures influence of variables under the uniform distribution.
Knowing the number r of relevant variables, the first algorithm runs
in almost linear time for r. The second and the third ones use
less membership queries than the first one, but runs in time exponential
for r. The fourth one enumerates highly influential variables in quadratic
time for r.
Download Compressed Postscript
|