Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv
نویسندگان
چکیده
Wordnet is a standard semantic resource for several Natural Language Processing tasks and it is available for an increasing number of languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10.026 synsets and 31.367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of the Princeton WordNet for English version 3.0, that has 117,659 synsets and 206,975 synset-variant pairs, it is clear that the CroWN should be expanded. First experiments for the expansion of the CroWN were performed using the WN-Toolkit, a set of Python programs for wordnet creation and expansion using dictionary, Babelnet and parallel-corpora based strategies. The WN-Toolkit was previously successfully applied to other languages as Spanish, Catalan and Galician. After this first expansion, CroWN reached 70.63% of the core wordnet. In the second step we used CroDeriv, a derivational database for Croatian and the manual creation of 1,457 synset-variant pairs until reaching 100% of the Core WordNet. After second step was completed, CroWN reached 23,137 synsets and 47,931 synset-lemma pairs.
منابع مشابه
Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit
In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in...
متن کاملWN-Toolkit: un toolkit per a la creació de WordNets a partir de diccionaris bilingües
This paper presents a set of programs to facilitate the creation of WordNet from bilingual dictionaries following the expand model. The programs are written in Python and are therefore multiplatform. The programs are very easy to use although they don’t have a graphical user interface. These programs have been successfully used in the Know2 Project for the creation of Catalan and Spanish WordNe...
متن کاملWN-Toolkit: Automatic generation of WordNets following the expand model
This paper presents a set of methodologies and algorithms to create WordNets following the expand model. We explore dictionary and BabelNet based strategies, as well as methodologies based on the use of parallel corpora. Evaluation results for six languages are presented: Catalan, Spanish, French, German, Italian and Portuguese. Along with the methodologies and evaluation we present an implemen...
متن کاملBootstrapping a Portuguese WordNet from Galician, Spanish and English Wordnets
In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora. The process generated a total of 56 770 synsets and 97 058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a pr...
متن کاملSyntactic Patterns in Croatian WordNet
The paper presents the detection of syntactic patterns in the Croatian WordNet synset definitions. The detection was performed in order to create unambiguous and consistent synset definitions in the future development of the Croatian WordNet. The rules are implemented in form of finite-state transducers and tested on already existing version of the Croatian WordNet. Results are presented using ...
متن کامل