Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers

نویسندگان

  • Georg Rehm
  • Richard Eckart de Castilho
  • Christian Chiarcos
  • Johannes Dellert
چکیده

We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-based querying of multiple corpora simultaneously.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multidimensional markup and heterogeneous linguistic resources

The paper discusses two topics: firstly an approach of using multiple layers of annotation is sketched out. Regarding the XML representation this approach is similar to standoff annotation. A second topic is the use of heterogeneous linguistic resources (e.g., XML annotated documents, taggers, lexical nets) as a source for semiautomatic multi-dimensional markup to resolve typical linguistic iss...

متن کامل

Cross Document Annotation for Multimedia Retrieval

This paper describes the MUMIS project, which applies ontology based Information Extraction to improve the results of Information Retrieval in multimedia archives. The domain specific ontology, the multilingual lexicons and the information passed between the different processing modules are all encoded in XML. The innovative aspect is the use of a cross document merging algorithm that uses the ...

متن کامل

A Multi-Layered, XML-Based Approach to the Integration of Linguis- tic and Semantic Annotations

In this paper we present a multi-layered approach to document annotation that allows for the structural integration of linguistic and semantic annotations produced by various language technology tools and using knowledge encoded in different domain ontologies as needed for semantic web applications.

متن کامل

TEITOK: Text-Faithful Annotated Corpora

TEITOK is a web-based framework for corpus creation, annotation, and distribution, that combines textual and linguistic annotation within a single TEI based XML document. TEITOK provides several built-in NLP tools to automatically (pre)process texts, and is highly customizable. It features multiple orthographic transcription layers, and a wide range of user-defined token-based annotations. For ...

متن کامل

Exploring XML-based technologies and procedures for quality evaluation from a real-life case perspective

The use of Extensible Markup Language (XML) for the annotation of Spoken Language Resources (SLR) is becoming increasingly common these days. Therefore the Speech Processing EXpertise centre (SPEX), which is the SLR validation centre of the European Language Resources Association (ELRA), is also being confronted more with XML. The project “Lexica and Corpora for Speech-to-Speech Translation Com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008