NeuroCOLT
workshop
on
Generalisation Bounds Less than 0.5
Windsor, 29 April - 2 May 2002
Cumberland
Lodge
"PAC-Bayesian Bounds for Kernel Classifiers based on Compression and Margin"
Shai Ben-David (Cornell University)
Many enterprises incorporate information gathered from a variety of data
sources into an integrated input for some learning task. For example, aiming
towards the design of an automated diagnostic tool for some disease, one may
wish to integrate data gathered in many different hospitals. A major obstacle to
such endeavors is that different data sources may vary considerably in the way
they choose to represent related data. In practice, the problem is usually
solved by a manual construction of semantic mappings and translations between
the different sources. Recently there have been attempts to introduce automated
algorithms based on machine learning tools for the construction of such
translations.
In this work we propose a theoretical framework for learning from a collection
of different data sources. Our framework allows a precise mathematical analysis
of the complexity of such tasks, and it provides a tool for the development and
comparison of different learning algorithms. Our main objective, at this stage,
is to demonstrate the usefulness of computational learning theory to this
practically important area and to stimulate further theoretical and experimental
research of questions related to this framework.