|
About
NeuroCOLT
Papers
Archive
Research
Areas
Partners
Coordinator
Events
info@neurocolt.org
|
NeuroCOLT
workshop
on
Applications of Learning to Text and Images
Windsor, 30 April - 2 May 2001
Cumberland
Lodge
Kernel Methods for Text and Hypertext Nello Cristianini BIOwulf
technologies nello@cs.rhul.ac.uk
Abstract
Kernel methods are a powerful new class of learning systems, that
combine remarkable performance with an elegant theoretical framework.
In particular, they make it possible to modularise the design
of a learning system, separating the learning module from the
feature extraction part, that interfaces it with the data. In
this talk we discuss the design of kernel functions specifically
for text and hypertext data. Such functions can then be used with
any kernel-based algorithm for tasks such as categorisation, clustering,
ranking and retrieval. Not only the kernel approach provides a
unified framework to describe well known strategies from Information
Retrieval, but also it motivates novel techniques. Latent Semantic
Indexing is a method for selecting informative subspaces of feature
spaces. It was developed for information retrieval to reveal semantic
information from document co-occurrences. The paper demonstrates
how this method can be implemented implicitly to a kernel defined
feature space and hence adapted for application to any kernel
based learning algorithm and data. Experiments with text and UCI
data show the technique can improve generalisation performance
by focussing attention of a Support Vector Machine onto informative
subspaces of the feature space. We will also discuss string kernels,
link kernels, and the general problem of combining kernels (eg
link + word kernels in hypertext).
|