A standardized general framework for encoding and exchange of corpus annotations: The Linguistic Annotation Framework, LAF
نویسنده
چکیده
The Linguistic Annotation Framework, LAF, proposes a generic data model for exchange of linguistic annotations and has recently become an ISO standard (ISO 24612:2012). This paper describes some aspects of LAF, its XML-serialization GrAF and some use-cases related to the framework. While GrAF has already been used as exchange format for corpora with several annotation layers, such as MASC and OANC1 the generic LAF data model also proved useful as the basis for the design of data structures for a relational database management system.
منابع مشابه
Representing Linguistic Corpora and Their Annotations
A Linguistic Annotation Framework (LAF) is being developed within the International Standards Organization Technical Committee 37 Sub-committee on Language Resource Management (ISO TC37 SC4). LAF is intended to provide a standardized means to represent linguistic data and its annotations that is defined broadly enough to accommodate all types of linguistic annotations, and at the same time prov...
متن کاملTowards International Standards for Language Resources
This paper describes the Linguistic Annotation Framework (LAF) developed by the International Standards Organization TC32 SC4, which is to serve as a basis for harmonizing existing language resources as well as developing new ones. We then describe the use of the LAF to represent the American National Corpus and its linguistic annotations.
متن کاملLAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible
The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an an...
متن کاملA LAF/GrAF based Encoding Scheme for underspecified Representations of syntactic Annotations
Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified representations are particularly well suited for the representation of ambiguous data because they allow for high informational efficiency. We discuss the issue of being informationally efficient, and the trade-off between efficient encoding of linguistic annotations a...
متن کاملThe Linguistic Annotation Framework: a standard for annotation interchange and merging
This paper overviews the International Standards Organization Linguistic Annotation Framework (ISO LAF) developed in ISO TC37 SC4. We describe the XML serialization of ISO LAF, the Graph Annotation Format (GrAF) and discuss the rationale behind the various decisions that were made in determining the standard. We describe the structure of the GrAF headers in detail and provide multiple examples ...
متن کامل