NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Applications of Learning to Text and Images
Windsor, 30 April - 2 May 2001
Cumberland Lodge

Kernel Methods for Text and Hypertext Nello Cristianini BIOwulf technologies nello@cs.rhul.ac.uk

Abstract

Kernel methods are a powerful new class of learning systems, that combine remarkable performance with an elegant theoretical framework. In particular, they make it possible to modularise the design of a learning system, separating the learning module from the feature extraction part, that interfaces it with the data. In this talk we discuss the design of kernel functions specifically for text and hypertext data. Such functions can then be used with any kernel-based algorithm for tasks such as categorisation, clustering, ranking and retrieval. Not only the kernel approach provides a unified framework to describe well known strategies from Information Retrieval, but also it motivates novel techniques. Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document co-occurrences. The paper demonstrates how this method can be implemented implicitly to a kernel defined feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with text and UCI data show the technique can improve generalisation performance by focussing attention of a Support Vector Machine onto informative subspaces of the feature space. We will also discuss string kernels, link kernels, and the general problem of combining kernels (eg link + word kernels in hypertext).