Pivot-based multilingual dictionary building using Wiktionary
نویسنده
چکیده
Judit Ács Research Institute for Linguistics Hungarian Academy of Sciences [email protected] Abstract We describe a method for expanding existing dictionaries in several languages by discovering previously non-existent links between translations. We call this method triangulation and we present and compare several variations of it. We assess precision manually, and recall by comparing the extracted dictionaries with independently obtained basic vocabulary sets. We featurize the translation candidates and train a maximum entropy classifier to identify correct translations in the noisy data.
منابع مشابه
Dbnary: Wiktionary as a Lemon Based RDF Multilingual Lexical Resource
Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our effort to extract multilingual lexical data from Wiktionary data and to provide it to the community as...
متن کاملDBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF
Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our extraction of multilingual lexical data from Wiktionary data and to provide it to the community as a M...
متن کاملMultilingual Ontology Matching based on Wiktionary Data Accessible via SPARQL Endpoint
Interoperability is a feature required by the Semantic Web. It is provided by the ontology matching methods and algorithms. But now ontologies are presented not only in English, but in other languages as well. It is important to use an automatic translation for obtaining correct matching pairs in multilingual ontology matching. The translation into many languages could be based on the Google Tr...
متن کاملDbnary: Wiktionary as a LMF based Multilingual RDF network
Contributive resources, such as wikipedia, have proved to be valuable in Natural Language Processing or Multilingual Information Retrieval applications. This article focusses on Wiktionary, the dictionary part of the collaborative resources sponsored by the Wikimedia
متن کاملSemi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary
Semantic lexical resources are a mainstay of various Natural Language Processing applications. However, comprehensive and reliable resources are rare and not often freely available. Handcrafted resources are too costly for being a general solution while automatically-built resources need to be validated by experts or at least thoroughly evaluated. We propose in this paper a picture of the curre...
متن کامل