Impact of MWE Resources on Multiword Recognition
نویسندگان
چکیده
In this paper, we demonstrate the impact of Multiword Expression (MWE) resources in the task of MWE recognition in text. We present results based on the Wiki50 corpus for MWE resources, generated using unsupervised methods from raw text and resources that are extracted using manual text markup and lexical resources. We show that resources acquired from manual annotation yield the best MWE tagging performance. However, a more finegrained analysis that differentiates MWEs according to their part of speech (POS) reveals that automatically acquired MWE lists outperform the resources generated from human knowledge for three out of four classes.
منابع مشابه
Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM
A web based Manipuri corpus is developed for identification of reduplicated multiword expression (MWE) and multiword named entity recognition (NER). Manipuri is one of the rarely investigated language and its resources for natural language processing are not available in the required measure. The web content of Manipuri is also very poor. News corpus from a popular Manipuri news website is coll...
متن کاملProjecting Multiword Expression Resources on a Polish Treebank
Multiword expressions (MWEs) are linguistic objects containing two or more words and showing idiosyncratic behavior at different levels. Treebanks with annotated MWEs enable studies of such properties, as well as training and evaluation of MWE-aware parsers. However, few treebanks contain full-fledged MWE annotations. We show how this gap can be bridged in Polish by projecting 3 MWE resources o...
متن کاملPARSEME Survey on MWE Resources
This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multiword datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult to find. In many cases, language resources such a...
متن کاملThe ATILF-LLF System for Parseme Shared Task: a Transition-based Verbal Multiword Expression Tagger
We describe the ATILF-LLF system built for the MWE 2017 Shared Task on automatic identification of verbal multiword expressions. We participated in the closed track only, for all the 18 available languages. Our system is a robust greedy transition-based system, in which MWE are identified through a MERGE transition. The system was meant to accommodate the variety of linguistic resources provide...
متن کاملMWU-Aware Part-of-Speech Tagging with a CRF Model and Lexical Resources
This paper describes a new part-of-speech tagger including multiword unit (MWU) identification. It is based on a Conditional Random Field model integrating language-independent features, as well as features computed from external lexical resources. It was implemented in a finite-state framework composed of a preliminary finite-state lexical analysis and a CRF decoding using weighted finitestate...
متن کامل