Morphosyntactic structure of terms in Basque for automatic terminology extraction
نویسندگان
چکیده
This paper describes the morphosyntactic patterns of technical terms in Basque and presents an architecture for a term-extracting tool. As Basque is a highly inflected agglutinative language, partof-speech information is not enough to define term patterns. The use of morphological and syntactic information is essential to reduce considerably the number of structures. For example, a noun, an adverbial, a postpositive adjectival, the nominal form of a verb and even a determiner in the genitive case may work as a prepositive adjective; however, they all share the same syntactic function. Therefore, for the term-extracting tool to perform properly, the texts must be morphosyntactically analysed and disambiguated. Then a shallow syntactic parser will identify the previously described patterns.
منابع مشابه
A XML-Based Term Extraction Tool for Basque
This project combines linguistic and statistical information to develop a term extraction tool for Basque. Being Basque an agglutinative and highly inflected language, the treatment of morphosyntactic information is vital. In addition, due to late unification process of the language, texts present more elevated term dispersion than in a highly normalized language. The result is a semiautomatic ...
متن کاملComputational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...
متن کاملELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the...
متن کاملCombining Wordnet and Morphosyntactic Information in Terminology Clustering
The paper presents results of clustering terms extracted from economic articles in Polish Wikipedia. First, we describe the method of automatic term extraction supported by linguistic knowledge. Then, we define different types of term similarities used in the clustering experiment. Term similarities are based on Polish Wordnet and morphosyntactic analysis of data. The latter takes into account:...
متن کاملEvaluation of an automatic process for specialized web corpora collection and term extraction for Basque
In this paper we describe the processes for collecting Basque specialized corpora in different domains from the Internet and subsequently extracting terminology out of them, using automatic tools in both cases. We evaluate the results of corpus compiling and term extraction by making use of a specialized dictionary recently updated by experts. We also compare the results of the automatically co...
متن کامل