Bootstrapping Language Description: the case of Mpiemo (Bantu A, Central African Republic)
نویسندگان
چکیده
Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.
منابع مشابه
Morphological function, syllabic and phonetic form of nasal+plosive combinations in the Bantu language Mpiemo
A discussion on how to handle consonant combinations in the Bantu language Mpiemo, spoken in the the south west border region of the Central African Republic is presented. The question is raised, whether nasal+consonant combinations are adequately analysed as single phonological units or as separate ones. Phonetic, syllabic and morphological aspects are taken into consideration.
متن کاملAcoustic properties of implosives in Bantu Mpiemo
Previous studies on implosives have shown a great diversity in the production of implosives among the languages in the world. In the light of this, this paper seeks to identify the acoustic phonetic properties of a Bantu language, Mpiemo, spoken in the Central African Republic. One of the strong acoustic correlates of implosives is increasing voicing amplitude during occlusion, which contrasts ...
متن کاملLexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملMitochondrial, Y-chromosomal and autosomal variation in Mbenzele Pygmies from the Central African Republic.
In this paper, we carry out a combined analysis of autosomal (ten microsatellites and an Alu insertion), mitochondrial (HVR-1 sequence, 360 nucleotides) and Y-chromosomal (seven microsatellites) variation in the Mbenzele Pygmies from the Central African Republic. This study focuses on two important questions concerning the admixture and origin of African Pygmies. Ethnographic observations sugge...
متن کاملMolecular epidemiology of human polyomavirus JC in the Biaka Pygmies and Bantu of Central Africa.
Polyomavirus JC (JCV) is ubiquitous in humans and causes a chronic demyelinating disease of the central nervous system, progressive multifocal leukoencephalopathy which is common in AIDS. JCV is excreted in urine of 30-70% of adults worldwide. Based on sequence analysis of JCV complete genomes or fragments thereof, JCV can be classified into geographically derived genotypes. Types 1 and 2 are o...
متن کامل