Massively parallel learning of part-of-speech disambiguation
نویسندگان
چکیده
This paper presents a method for massively parallel learning of part-of-speech disambiguation based on a minmax modular neural network model. The method has three main steps. Firstly, a large-scale tagging problem is decomposed into a number of relatively smaller and simpler subproblems according to the class relations among a given training corpus. Secondly, all of the subproblems are learned by small network modules in parallel. Finally, following two module combination principles, all of the trained network modules are integrated into a modular parallel tagging system that produces solutions to the original tagging problem. The proposed method has several advantages over the existing tagging systems based on multilayer perceptrons. (1) Training times can be drastically reduced and desired learning accuracy can be easily achieved; (2) the method can scale up to larger tagging problems; and (3) the tagging system has quick response and facilitates hardware implementation. In order to demonstrate the effectiveness of the proposed method, we perform simulations on two different language corpora: a Thai corpus and a Chinese corpus, which have 29,028 and 45,595 ambiguous words, respectively. We also compare our method with several existing tagging models such as hidden Markov models, multilayer perceptrons and neuro-taggers. The results show that both the learning accuracy and generalization performance of the proposed tagging model are better than statistical models and multilayer perceptrons, and they are comparable to the most successful tagging models.
منابع مشابه
معرفی رویکردی ماشینی با استفاده از الگوریتم لسک و برچسبدهی نحوی جهت رفع ابهام از معنای کلمات
The present study introduces a machine-based approach for word sense disambiguation (WSD). In Persian, a morphologically complex language, POS tag which lots of homographs are made, one way for doing WSD is allocating the right Part Of Speech (POS) tags to words prior to WSD. Since the frequency of noun and adjective homographs in different Persian POS tag text corpuses is high, POS tag disambi...
متن کاملUnsupervised Learning of Disambiguation Rules for Part of Speech Tagging
In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate t...
متن کاملTAKTAG: Two-phase learning method for hybrid statistical/rule-based part-of-speech disambiguation
Both statistical and rule-based approaches to part-of-speech (POS) disambiguation have their own advantages and limitations. Especially for Korean, the narrow windows provided by hidden markov model (HMM) cannot cover the necessary lexical and longdistance dependencies for POS disambiguation. On the other hand, the rule-based approaches are not accurate and flexible to new tag-sets and language...
متن کاملLearning part of speech disambiguation rules using Inductive Logic Programming
A pilot study on inducing rules for part of speech tagging of unrestricted Swedish text is reported. Using the Progol machine-learning system, Constraint Grammar inspired rules were learnt from the part of speech tagged Stockholm-Ume a Corpus. Several thousand disambiguation rules discarding faulty readings of ambiguously tagged words were induced. When tested on unseen data, 97% of the words r...
متن کاملMassively Parallel Part of Speech Tagging Using Min-Max Modular Neural Networks
This paper presents a massively parallel tagging method for automatically assigning the correct part of speech (POS) tag to each ambiguous word in a sentence in the context of the sentence. This method is based on the min-max modular neural network, an e cient modular neural network model for solving large-scale pattern recognition problems. The method has two attractive features. One is that i...
متن کامل