Representing and Accessing Multilevel Linguistic Annotation using the MEANING Format
نویسندگان
چکیده
We present an XML annotation format (MEANING Annotation Format, MAF) specifically designed to represent and integrate different levels of linguistic annotations and a tool that provides flexible access to them (MEANING Browser). We describe our experience in integrating linguistic annotations coming from different sources, and the solutions we adopted to implement efficient access to corpora annotated with the Meaning Format.
منابع مشابه
Discontinuous Constituents: a Problematic Case for Parallel Corpora Annotation and Querying
In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...
متن کاملAnCoraPipe: A tool for multilevel annotation
AnCoraPipe is a corpus annotation tool which allows different linguistic levels to be annotated simultaneously and efficiently, since it uses a single format for all stages. In this way, the required annotation time is reduced and the integration of the work of all annotators is made easier.
متن کاملAccessing Heterogeneous Linguistic Data — Generic XML-based Representation and Flexible Visualization
Annotation of linguistic data increasingly focuses on information beyond the (morpho-)syntactic level. Moreover, annotated data of less-studied languages is growing in importance. To maximally profit from this data, straightforward and user-friendly access has to be provided. In this paper, we describe a linguistic database that is accessed via a web browser and offers flexible visualization of...
متن کاملGrAF: A Graph-based Format for Linguistic Annotations
In this paper we describe the Graph Annotation Format (GrAF) and show how it is used represent not only independent linguistic annotations, but also sets of merged annotations as a single graph. To demonstrate this, we have automatically transduced several different annotations of the Wall Street Journal corpus into GrAF and show how the annotations can then be merged, analyzed, and visualized ...
متن کاملThe NITE Object Model Library for Handling Structured Linguistic Annotation on Multimodal Data Sets
The NITE Object Model Library is an implemented set of routines for loading, accessing, manipulating, and serializing linguistic data. It is similar in spirit to the data handling provided by the Annotation Graph Toolkit, but is aimed at data that is heavily cross-annotated with structured information, and thus chooses higher expressivity at the cost of processing speed. We describe our open-so...
متن کامل