an extension of the tiger query language torsten marek
نویسنده
چکیده
In the Computational Linguistics community, much work is put into the creation of large, high-quality linguistic resources, often with complex annotation. In order to make these resources accessible to nontechnical audiences, formalisms for searching and filtering are needed. The TIGER query language can, by describing partial structures, be used to search treebanks with syntactic annotation. Recently, augmented treebanks have been published, including the SALSA corpus which features frame semantic annotation on top of syntactic structure. Query languages, however, need to keep up with newly introduced annotation, allowing it to be searchable and easy to access. We design an extension for the TIGER language which allows searching for frame structures along with syntactic annotation. To achieve this, the TIGER object model is expanded to include frame semantics, while remaining fully backwards-compatible. Finally, these extensions have been added to our own implementation of TIGER, which includes novel indexing features not found in the original work of Lezius (2002a). Z U S A M M E N F A S S U N G Ein großer Teil der Arbeit in der Computerlinguistik wird auf die Erstellung hochqualitativer linguistischer Resourcen mit oft komplexer Annotation verwendet. Damit diese Resourcen auch dann noch von nicht-technischen Benutzern verwendet werden können, wenn sie eine gewisse Größe überschritten haben, sind Formalismen zum Durchsuchen und Filtern von großer Wichtigkeit. Zum Durchsuchen von Baumbanken mit syntaktischer Annotation kann die TIGER-Abfragesprache benutzt werden, die die Beschreibung partieller Strukturen ermöglicht. In den letzten Jahren wurden jedoch erweiterte Baumbanken erstellt, so zum Beispiel das SALSA-Korpus, das zusätzlich zur syntaktischen auch Annotation von semantischen Frames enthält. Abfragesprachen wie TIGER müssen mit der Erweiterung der Annotation mithalten, da diese sonst nicht durchsuchbar und somit nur schwer zugänglich ist. Wir entwickeln eine Erweiterung für die TIGER-Abfragesprache, die zusätzlich zur Suche über Syntax auch Frame-Strukturen unterstützt. Um dies zu erreichen, erweitern wir unter Erhaltung vollständiger Rückwärtskompatibilität das TIGER-Objektmodell mit neuen Typen für die Frame-Semantik. Darüber hinaus haben wir diese Erweiterungen im Rahmen unser eigenen TIGER-Implementation realisiert, die Methoden zur GraphIndizierung benutzt, welche über die ursprüngliche Arbeit von Lezius (2002a) hinausgehen.
منابع مشابه
Extending the TIGER query language with universal quantification
The query language in TIGERSearch is limited due to its lack of universal quantification. This restriction makes it impossible to make simple queries like “Find sentences that do not include a certain word”. We propose an easy way to formulate such queries. We have implemented this extension to the query language in a tool that allows querying parallel treebanks, while including their alignment...
متن کاملانتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملA Declarative Formalism for Constituent-to-Dependency Conversion
In this paper, we present a declarative formalism for writing rule sets to convert constituent trees into dependency graphs. The formalism is designed to be independent of the annotation scheme and provides a highly task-related syntax, abstracting away from the underlying graph data structures. We have implemented the formalism in our search tool and used a preliminary version to create a rule...
متن کاملRefining Queries on a Treebank with XSLT Filters. Approaching the Universal Quantifier
This paper discusses the use of XSLT stylesheets as a filtering mechanism for refining the results of user queries on treebanks. The discussion is within the context of the TIGER treebank, the associated search engine and query language, but the general ideas can apply to any search engine for XML-encoded treebanks. It will be shown that important classes of linguistic phenomena can be accessed...
متن کاملTIGER: Querying Large Tables through Criterion Extension
Sales on the Internet have increased significantly during the last decade, and so, it is crucial for companies to retain customers on their web site. Among all strategies towards this goal, providing customers with a flexible search tool is a crucial issue. In this paper, we propose an approach, called TIGER, for handling such flexibility automatically. More precisely, if the search criteria of...
متن کامل