Supervised PP - Attachment Disambiguation for Swedish ; ( Combining Unsupervised & Supervised Training Data )
نویسنده
چکیده
This paper is about the application of Machine Learning techniques to the prepositional-phrase attachment ambiguity problem. Since Machine Learning requires large amounts of training instances, the mixture of unsupervised and restricted supervised acquisition of such data will be also reported. Training was performed both on a subset of the content of the Gothenburg Lexical Database (GLDB), and a combination of instances from large corpora. Testing was performed using a range of different algorithms and metrics. The application language is written Swedish.
منابع مشابه
Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation
Statistical methods for PP attachment fall into two classes according to the training material used: first, unsupervised methods trained on raw text corpora and second, supervised methods trained on manually disambiguated examples. Usually supervised methods win over unsupervised methods with regard to attachment accuracy. But what if only small sets of manually disambiguated material are avail...
متن کاملThe Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation
We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The supervised component is Collins’ parser, trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. We find that the combin...
متن کاملImproving PP Attachment Disambiguation in a Rule-based Parser
This paper deals with how to enhance the performance of a rule-based parser using statistical Information. PP (Prepositional Phrase) attachment ambiguity is one of the main ambiguities found in parsing. We therefore conducted some experiments on extracting statistical information for PP attachment from a corpus, and on applying such information to a rule-based parser. Two types of information a...
متن کاملCorpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary
This paper deals with two important ambiguities of natural language: prepositional phrase attachment and word sense ambiguity. We propose a new supervised learning method for PPattachment based on a semantically tagged corpus. Because any sufficiently big sense-tagged corpus does not exist, we also propose a new unsupervised context based word sense disambiguation algorithm which amends the tra...
متن کاملUnsupervised Learning of Syntactic Knowledge: Methods and Measures
Supervised methods for ambiguity resolution learn in "sterile" environments, in absence of syntactic noise. However, in many language engineering applications manually tagged corpora are not available nor easily implemented. On the other side, the "exportability" of disambiguation cues acquired from a given, noise-free, domain (e.g. the Wall Street Journal) to other domains is not obvious. Unsu...
متن کامل