Thomas Aquinas in the TüNDRA: Integrating the Index Thomisticus Treebank into CLARIN-D

نویسندگان

  • Scott Martens
  • Marco Passarotti
چکیده

This paper describes the integration of the Index Thomisticus Treebank (IT-TB) into the web-based treebank search and visualization application TüNDRA (Tübingen aNnotated Data Retrieval & Analysis). TüNDRA was originally designed to provide access via the Internet to constituency treebanks and to tools for searching and visualizing them, as well as tabulating statistics about their contents. TüNDRA has now been extended to also provide full support for dependency treebanks with non-projective dependencies, in order to integrate the IT-TB and future treebanks with similar properties. These treebanks are queried using an adapted form of the TIGERSearch query language, which can search both hierarchical and sequential information in treebanks in a single query. As a web application, making the IT-TB accessible via TüNDRA makes the treebank and the tools to use of it available to a large community without having to distribute software and show users how to install it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of the "Index Thomisticus" Treebank Valency Lexicon

We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Treebank, a syntactically annotated corpus of Medieval Latin texts by Thomas Aquinas. In our corpus-based approach, the lexicon reflects the empirical evidence of the source data. Verbal arguments are induced directly from annotated data. The lexicon contains 432 Latin verbs with 270 valency frames. The lexicon is...

متن کامل

Improvements in Parsing the Index Thomisticus Treebank. Revision, Combination and a Feature Model for Medieval Latin

The creation of language resources for less-resourced languages like the historical ones benefits from the exploitation of language-independent tools and methods developed over the years by many projects for modern languages. Along these lines, a number of treebanks for historical languages started recently to arise, including treebanks for Latin. Among the Latin treebanks, the Index Thomisticu...

متن کامل

The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon

We present an overview of the Index Thomisticus Treebank project (IT-TB). The ITTB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven dependency parsers on IT-TB and LDT data is...

متن کامل

Differentia compositionem facit. A Slower-Paced and Reliable Parser for Latin

The Index Thomisticus Treebank is the largest available treebank for Latin; it contains Medieval Latin texts by Thomas Aquinas. After experimenting on its data with a number of dependency parsers based on different supervised machine learning techniques, we found that DeSR with a multilayer perceptron algorithm, a right-to-left transition, and a tailor-made feature model is the parser providing...

متن کامل

The Annotation Guidelines of the Latin Dependency Treebank and Index Thomisticus Treebank: the Treatment of some specific Syntactic Constructions in Latin

The paper describes the treatment of some specific syntactic constructions in two treebanks of Latin according to a common set of annotation guidelines. Both projects work within the theoretical framework of Dependency Grammar, which has been demonstrated to be an especially appropriate framework for the representation of languages with a moderately free word order, where the linear order of co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014