Improving the Basque WordNet by corpus annotation
نویسندگان
چکیده
This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way though the nominal part of the 300.000 word corpus (roughly equivalent to a 500.000 word corpus for English).
منابع مشابه
Methodology and construction of the Basque WordNet
Semantic interpretation of language requires extensive and rich lexical knowledge bases (LKB). The Basque WordNet is a LKB based on WordNet and its multilingual counterparts EuroWordNet and the Multilingual Central Repository. This paper reviews the theoretical and practical aspects of the Basque WordNet lexical knowledge base, as well as the steps and methodology followed in its construction. ...
متن کاملZT Corpus: Annotation and Tools for Basque Corpora
The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: ...
متن کاملA methodology for the joint development of the Basque WordNet and Semcor
This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way through the nominal part of the 300.000 word corpus (rou...
متن کاملSemantic Hand-Tagging of the SenSem Corpus Using Spanish WordNet Senses
This paper presents the semantic annotation of the SenSem Spanish corpus, a research focused on the semantic annotation of the nominal heads of the verbal arguments, with the final goal of acquiring semantic preferences for verb senses. We used Spanish WordNet 1.6 senses in the annotation process. This process involves the analysis of the adequacy of WordNet for semantic annotation and, in case...
متن کاملStructure, Annotation and Tools in the Basque ZT Corpus
The ZT corpus (Basque Corpus of Science and Technology) is a tagged collection of specialized texts in Basque, which wants to be a main resource in research and development about written technical Basque: terminology, syntax and style. It will be the first written corpus in Basque which will be distributed by ELDA (at the end of 2006) and it wants to be a methodological and functional reference...
متن کامل