NeuroCOLT

Neural Networks and Computational Learning Theory

 

About NeuroCOLT

Papers Archive

1994 1995
1996 1997
1998 1999
2000 2001

Books

info@neurocolt.org

NeuroCOLT Technical Report NC-TR-94-015

Grammar Inference and the Minimum Description Length Principle

Peter Grünwald
Centrum voor Wiskunde en Informatica
Kruislaan 413, 1098 SJ Amsterdam
The Netherlands

Abstract
We describe a new abstract model for the computational learning of grammars. The model deals with a learning process in which an algorithm is given an input of a large set of training sentences that belong to some grammar $G$. The algorithm then tries to infer this grammar. Our model is based on the well-known {\em Minimum Description Length Principle}. It turns out that our model is, in a certain sense, a more general version of two seemingly different well-known other ones. Also, two other existing models turn out to be very similar to ours. We have made an initial implementation of the algorithm implied by the model.  We have tried this implementation on natural language texts, and we give a short description of the results of these tests.  The results of testing the algorithm in practice are quite interesting, but unfortunately they are neither encouraging nor discouraging enough to indicate whether our method of grammar induction, which hardly makes any use of any linguistic principles and makes no use at all of any semantical information, is really worth pursuing further.

 

Download Compressed Postscript