Automatic interlinear glossing as two-level sequence classification
نویسندگان
چکیده
Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic glossing of Chintang. We decompose the task of glossing into steps suitable for statistical processing. We first perform grammatical glossing as standard supervised part-of-speech tagging. We then add lexical glosses from a stand-off dictionary applying context disambiguation in a similar way to word lemmatisation. We obtain the highest accuracy score of 96% for grammatical and 94% for lexi-
منابع مشابه
The Effects of Glossing Conventions on L2 Vocabulary Recognition and Production
To investigate the effects of different glossing conventions on vocabulary recognition and recall, 158 participants were given a pre-test to make sure that they did not have any prior knowledge of the target words. Reading passages with four different glossing conventions (interlinear, marginal, pre-text, and post-text) were given to eight groups. Four groups received interlingual glosses and f...
متن کاملEnriching Interlinear Text using Automatically Constructed Annotators
In this paper, we will demonstrate a system that shows great promise for creating Part-of-Speech taggers for languages with little to no curated resources available, and which needs no expert involvement. Interlinear Glossed Text (IGT) is a resource which is available for over 1,000 languages as part of the Online Database of INterlinear text (ODIN) (Lewis and Xia, 2010). Using nothing more tha...
متن کاملA Morphological Glossing Assistant
One of the tasks language documenters face is that of assigning glosses to function morphemes, including affixes. These glosses are typically used in marking up interlinear text at a morpheme level. But without a morphological parser, marking up interlinear text is tedious and error-prone. Ideally, a parser will be guided not only by the form and syntagmatic properties of morphemes, but also by...
متن کاملInterlinear Glossing and its Role in Theoretical and Descriptive Studies of African and other Lesser–Documented Languages
In a manuscript William Labov (1987) states that although linguistics is a field with a long historical tradition and with a high degree of consensus on basic categories, it experiences a fundamental devision concerning the role that quantitative methods should play as part of the research progress. Linguists differ in the role they assign to the use of natural language examples in linguistic r...
متن کاملAutomatic Creation of Interlinear Text for Philological Purposes
Interlinear text presents a collection of interpretations of a manuscript. Whereas such a form is often compiled by a single author or a single team of scholars, we here consider automatic creation of interlinear text out of independently created linguistic resources. In terms of mathematical structures, we investigate the constraints one may want to impose on the rendering and pair-wise alignm...
متن کامل