Morphological Analysis of Sahidic Coptic for Automatic Glossing
نویسندگان
چکیده
We report on the implementation of a morphological analyzer for the Sahidic dialect of Coptic, a now extinct Afro-Asiatic language. The system is developed in the finite-state paradigm. The main purpose of the project is provide a method by which scholars and linguists can semi-automatically gloss extant texts written in Sahidic. Since a complete lexicon containing all attested forms in different manuscripts requires significant expertise in Coptic spanning almost 1,000 years, we have equipped the analyzer with a core lexicon and extended it with a ‘guesser’ ability to capture out-of-vocabulary items in any inflection. We also suggest an ASCII transliteration for the language. A brief evaluation is provided.
منابع مشابه
Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities
This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendent of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...
متن کاملComputational Methods for Coptic
This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...
متن کاملSecond Position Clitics and Monadic Second-Order Transduction
The simultaneously phonological and syntactic grammar of second position clitics is an instance of the broader problem of applying constraints across multiple levels of linguistic analysis. Syntax frameworks extended with simple tree transductions can make efficient use of these necessary additional forms of structure. An analysis of Sahidic Coptic second position clitics in a context-free gram...
متن کاملAutomatic interlinear glossing as two-level sequence classification
Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کامل