|
NeuroCOLT
Technical Report NC-TR-94-015
Grammar Inference
and the Minimum Description Length Principle
Peter
Grünwald
Centrum voor Wiskunde en Informatica
Kruislaan 413, 1098 SJ Amsterdam
The Netherlands
Abstract
We describe a new abstract model for the computational learning of
grammars. The model deals with a learning process in which an algorithm
is given an input of a large set of training sentences that belong
to some grammar $G$. The algorithm then tries to infer this grammar.
Our model is based on the well-known {\em Minimum Description Length
Principle}. It turns out that our model is, in a certain sense, a
more general version of two seemingly different well-known other ones.
Also, two other existing models turn out to be very similar to ours.
We have made an initial implementation of the algorithm implied by
the model. We have tried this implementation on natural language
texts, and we give a short description of the results of these tests.
The results of testing the algorithm in practice are quite interesting,
but unfortunately they are neither encouraging nor discouraging enough
to indicate whether our method of grammar induction, which hardly
makes any use of any linguistic principles and makes no use at all
of any semantical information, is really worth pursuing further.
Download Compressed
Postscript
|