NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

Research Areas

Partners

Coordinator

Events

info@neurocolt.org

 

NeuroCOLT workshop
on
Applications of Learning to Text and Images
Windsor, 30 April - 2 May 2001
Cumberland Lodge

"Machine Learning for Natural Language"

Michael Collins

Online Presentation (Postscript file)

Many problems in natural language processing require learning of a mapping from one discrete structure to another. Examples are parsing (learning a function from strings to trees); named entity extraction (learning a function from strings to an underlying segmentation identifying people, places, locations and so on); and machine translation (learning a function from sentences in a source language to sentences in a target language). This talk will concentrate on problems of this kind. A very common approach to these tasks is to use some kind of generative model, for example a hidden markov model, or a stochastic context-free grammar (SCFG). Another method is to use "history-based models", as introduced by researchers at IBM. These approaches are probabilistic, in that they attempt to model a joint or conditional probability distribution over possible structures.

Methods from machine learning and computational learning theory have had relatively little application in natural language, and are appealing in several ways -- both from a theoretical perspective (e.g., distribution-free properties) and in practical terms (e.g., the opportunities that kernels or boosting can provide in terms of the feature spaces they can utilize). There are, however, several challenges (and research oppotunities!) in applying these methods to natural language. NLP problems involve rich, discrete structures. Feature spaces are often very high dimensional, and very sparse. The tasks I will talk about are not classification problems, instead involving the choice of correct structure from a large number of candidate alternatives. As a case study, I will talk in some detail about natural language parsing. In the early 1990s a rich source of supervised training data was released to the NLP community: the Penn treebank -- roughly 1 million words of Wall Street Journal text annotated for syntactic structure. Since then there has been much work, and steady progress, on machine learning for parsing. I'll explain how simple SCFGs can be applied to the task, and why they perform poorly. I'll then describe models which are still generative in nature, but are more sensitive to lexical information, giving a substantial increase in performance. Finally, I'll describe more recent work on learning methods which promise to be considerably more flexible in the features they can incorporate. I'll describe how three such approaches -- boosting, support vector machines and markov random fields -- can be applied to parsing. In particular, I will describe kernels for various NLP structures such as trees, hidden markov models and dependency structures. To conclude the talk I will return to related problems such as named entity extraction or machine translation, and describe how similar methods might be applied to these tasks. I'll also talk about "question answering" as a motivating application, and explain why I think it's a natural progression from standard Information Retrieval (*document* retrieval) to retrieving results that are more specific to a user's query (*sub-document* retrieval).

 

Joint work with Steve Abney, Nigel Duffy, Rob Schapire and Yoram Singer.