Using Morphology And Syntax Together In Unsupervised Learning
نویسندگان
چکیده
Unsupervised learning of grammar is a problem that can be important in many areas ranging from text preprocessing for information retrieval and classification to machine translation. We describe an MDL based grammar of a language that contains morphology and lexical categories. We use an unsupervised learner of morphology to bootstrap the acquisition of lexical categories and use these two learning processes iteratively to help and constrain each other. To be able to do so, we need to make our existing morphological analysis less fine grained. We present an algorithm for collapsing morphological classes (signatures) by using syntactic context. Our experiments demonstrate that this collapse preserves the relation between morphology and lexical categories within new signatures, and thereby minimizes the description length of the model.
منابع مشابه
Unsupervised Learning of Morphology by using Syntactic Categories
This paper presents a method for unsupervised learning of morphology that exploits the syntactic categories of words. Previous research [4][12] on learning of morphology and syntax has shown that both kinds of knowledge affect each other making it possible to use one type of knowledge to help the other. In this work, we make use of syntactic information i.e. Part-of-Speech (PoS) tags of words t...
متن کاملBiologically-Motivated Machine Learning of Natural Language and Ontology A Computational Cognitive Model
The individual cognitive science disciplines all have contributions to make to the understanding and modelling of human learning. Our previous research has explored unsupervised learning of phonology, morphology and low-level syntax, as well as basic noun, verb and preposition ontology and semantics, plus musical and speech prosody. Successful applications using a mix of supervised and unsuperv...
متن کاملModeling Acquisition of Word Structure with Lexicalized Grammar Learning
Introduction This paper introduces a framework for learning structure in natural languages, and reports results from a simple application of it to learning word-syntax of an agglutinative language in an unsupervised manner. Arguably, the learning environment of children acquiring languages provides more information—by means of linguistic interaction and extralinguistic information present in th...
متن کاملModeling Acquisition of Word Structure with Lexicalized Grammar Learning
This paper introduces a framework for learning structure in natural languages, and reports results from a simple application of it to learning word-syntax of an agglutinative language in an unsupervised manner. Arguably, the learning environment of children acquiring languages provides more information—by means of linguistic interaction and extralinguistic information present in the learning se...
متن کاملEfficient, Correct, Unsupervised Learning for Context-Sensitive Languages
A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languag...
متن کامل