Automatic Sense Tagging Using Parallel Corpora
نویسندگان
چکیده
This article reports the results of an analysis of translation equivalents in six languages from different language families, extracted from an on-line parallel corpus of George Orwell’s Nineteen Eighty-Four. The goal is to determine sense distinctions that can be used to automatically sense-tag the data. Our results show that sense distinctions derived from crosslingual information correspond to those made by human annotators, especially at the coarse-grained level. We also show that the reliability of sense assignments at finer-grained levels is comparable for human annotators and those produced automatically with cross-lingual data.
منابع مشابه
Parallel Corpora for WordNet Construction: Machine Translation vs. Automatic Sense Tagging
In this paper we present a methodology for WordNet construction based on the exploitation of parallel corpora with semantic annotation of the English source text. We are using this methodology for the enlargement of the Spanish and Catalan versions of WordNet 3.0, but the methodology can also be used for other languages. As big parallel corpora with semantic annotation are not usually available...
متن کاملAn Unsupervised Method For Multilingual Word Sense Tagging Using Parallel Corpora
With an increasing number of languages making their way to our desktops everyday via the Internet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of those languages, this paper presents an unsupervised method for automatic multilingual word sense...
متن کاملAn Unsupervised Method for Multilingual Word Sense Tagging Using Parallel Corpora: A Preliminary Investigation
With an increasing number of languages making their way to our desktops everyday via the Internet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of those languages, this paper presents an unsupervised method for automatic multilingual word sense...
متن کاملSense Discrimination with Parallel Corpora
This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators. Because our approach is fully automated through a...
متن کاملBulgarian X-language Parallel Corpus
The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...
متن کامل