NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Applications of Learning to Text and Images
Windsor, 30 April - 2 May 2001
Cumberland Lodge

"Matrix Decomposition Methods in Information Retrieval"

Thomas Hofmann, Brown University

Online Presentation

Many problems in information retrieval and information filtering involve data that can be represented in form of a sparse matrix with binary values or frequency counts. This includes document-term frequencies, user ratings on a set of items, and adjacency matrices encoding the hyperlink graph or citation structure in document repositories. There are a number of generic questions that typically occur in this context. Most prominently, one would like to overcome the sparseness problem, i.e., reliably estimate probabilities for unobserved or rare events. In addition, the derivation of low-dimensional data representations and the identification of latent factors is often of considerable interest as a preprocessing step for subsequent processing as well as for visualization. This talk will introduce and discuss methods for matrix decomposition and dimension reduction that address these questions. Several example applications from information retrieval will be used to illustrate the fruitfulness of this class of methods and to demonstrate the effectiveness of decomposition techniques. The latter will include (i) estimating document-specific language models in ad hoc retrieval, (ii) deriving topic-centered document representations for document categorization, (iii) decomposing user preferences for collaborative filtering, (iv) learning stochastic models for hyperlink and paper citation graphs. Algorithmic and scalability issues will also be discussed in detail.