Structure, Annotation and Tools in the Basque ZT Corpus

نویسندگان

  • Nerea Areta
  • Antton Gurrutxaga
  • Igor Leturia
  • Ziortza Polin
  • Rafa Saiz
  • Iñaki Alegria
  • Xabier Artola
  • Arantza Díaz de Ilarraza
  • Nerea Ezeiza
  • Aitor Sologaistoa
  • Aitor Soroa
  • Andoni Valverde
چکیده

The ZT corpus (Basque Corpus of Science and Technology) is a tagged collection of specialized texts in Basque, which wants to be a main resource in research and development about written technical Basque: terminology, syntax and style. It will be the first written corpus in Basque which will be distributed by ELDA (at the end of 2006) and it wants to be a methodological and functional reference for new projects in the future (i.e. a national corpus for Basque). We also present the technology and the tools to build this Corpus. These tools, Corpusgile and Eulia, provide a flexible and extensible infrastructure for creating, visualizing and managing corpora and for consulting, visualizing and modifying annotations generated by linguistic tools.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ZT Corpus: Annotation and Tools for Basque Corpora

The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: ...

متن کامل

Improving the Basque WordNet by corpus annotation

This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way though the nominal part of the 300.000 word corpus (roug...

متن کامل

The RST Basque TreeBank: an online search interface to check rhetorical relations

This paper introduces the first Basque discourse TreeBank annotated with rhetorical relations following Rhetorical Structure Theory. We report the main features of the corpus, such as the annotation criteria, inter-annotator agreement and harmonization procedure. We describe an online search system to check the annotation of discourse relations.

متن کامل

Exploiting Semantic Information For Manual Anaphoric Annotation In Cast3LB Corpus

This paper presents the discourse annotation followed in Cast3LB, a Spanish corpus annotated with several information sources (morphological, syntactic, semantic and coreferential) at syntactic, semantic and discourse level. 3LB annotation scheme has been developed for three languages (Spanish, Catalan and Basque). Human annotators have used a set of tagging techniques and protocols. Several to...

متن کامل

Pronominal anaphora in Basque: annotation of a real corpus

This paper describes the process followed in the annotation of pronominal anaphora in the Eus3LB corpus of Basque. Our aim is to use this annotation as the basis for later computational treatment of our language. We present the linguistic analysis carried out, the criteria defined for the tagging and some relevant linguistic conclusions about the features of the antecedents needed to link them ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006