|
About
NeuroCOLT
Papers
Archive
Books
info@neurocolt.org
|
NeuroCOLT
Technical Report NC-TR-02-122
2002-122
On the Application of Diffusion Kernels to Text Data
Jaz Kandola
John Shawe-Taylor
Nello Cristianini
ABSTRACT
Kernel methods, such as
Support Vector Machines, have successfully been used for
text categorization. A standard choice of kernel function has been the inner
product between the vector-space representation of two documents, in analogy
with classical information retrieval (IR) approaches. In this paper
we consider diffusion kernels (Kondor, 2001) and their suitability for
text data. We motivate their use from a graph theoretic framework. We propose
an approach based on alignment for selecting the optimal decay parameter
$\lambda$ in these kernels. We provide experimental results demonstrating that
diffusion kernels are attractive choices for modelling text data.
Download
Postscript
|