Proposal for MWE Annotation in Running Text
نویسندگان
چکیده
We present a proposal for the annotation of multi-word expressions in a 1M corpus of contemporary portuguese. Our aim is to create a resource that allows us to study multi-word expressions (MWEs) in their context. The corpus will be a valuable additional resource next to the already existing MWE lexicon that was based on a much larger corpus of 50M words. In this paper we discuss the problematic cases for annotation and proposed solutions, focusing on the variational properties of MWEs.
منابع مشابه
MWE in Portuguese: Proposal for a Typology for Annotation in Running Text
Based on a lexicon of Portuguese MWE, this presentation focuses on an ongoing work that aims at the creation of a typology that describes these expressions taking into account their semantic, syntactic and pragmatic properties. We also plan to annotate each MWEentry in the mentioned lexicon according to the information obtained from that typology. Our objective is to create a valuable resource,...
متن کاملImpact of MWE Resources on Multiword Recognition
In this paper, we demonstrate the impact of Multiword Expression (MWE) resources in the task of MWE recognition in text. We present results based on the Wiki50 corpus for MWE resources, generated using unsupervised methods from raw text and resources that are extracted using manual text markup and lexical resources. We show that resources acquired from manual annotation yield the best MWE taggi...
متن کاملAnnotation of Multi-Word Expressions in Czech Texts
Multi-word expressions (MWEs) are difficult to define and also difficult to annotate. Some of them cause serious errors in the traditional annotation pipeline tokenization – morphological analysis – morphological disambiguation. Many cases of incorrect annotation in Czech corpora are known. To narrow the research topic, we focus only in fixed MWEs – those with fixed word order and no ellidable ...
متن کاملTED-MWE: a bilingual parallel corpus with MWE annotation Towards a methodology for annotating MWEs in parallel multilingual corpora
English. The translation of Multiword expressions (MWE) by Machine Translation (MT) represents a big challenge, and although MT has considerably improved in recent years, MWE mistranslations still occur very frequently. There is the need to develop large data sets, mainly parallel corpora, annotated with MWEs, since they are useful both for SMT training purposes and MWE translation quality eval...
متن کاملProject proposal Automatic extraction and evaluation of MWE: adapting method to French Language Technology: Research and Development
Our project is based on the theme of Multi Word Expressions (MWE) we will focus on the problem of extraction. This task is important for improving lexical resources used for tasks such as tokenization, parsing or translation. In our study we will work on a French corpus. Our aim will be to not only select but also validate automatically which candidates are the true ones. If we have time we wil...
متن کامل