A Dependency Treebank for Kurmanji Kurdish
نویسندگان
چکیده
This paper describes the development of the first syntactically annotated corpus of Kurmanji Kurdish. The corpus was used as one of the surprise languages in the 2017 CoNLL shared task on parsing Universal Dependencies. In the paper we describe how the corpus was prepared, some Kurmanji specific constructions that required special treatment, and we give results for parsing Kurdish using two popular datadriven parsers.
منابع مشابه
Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison
Resource scarcity along with diversity– both in dialect and script–are the two primary challenges in Kurdish language processing. In this paper we aim at addressing these two problems by (i) building a text corpus for Sorani and Kurmanji, the two main dialects of Kurdish, and (ii) highlighting some of the orthographic, phonological, and morphological differences between these two dialects from ...
متن کاملKurdish Interdialect Machine Translation
This research suggests a method for machine translation among two Kurdish dialects. We chose the two widely spoken dialects, Kurmanji and Sorani, which are considered to be mutually unintelligible. Also, despite being spoken by about 30 million people in different countries, Kurdish is among less-resourced languages. The research used bi-dialectal dictionaries and showed that the lack of parall...
متن کاملMtDNA and Y-chromosome variation in Kurdish groups.
In order to investigate the origins and relationships of Kurdish-speaking groups, mtDNA HV1 sequences, eleven Y chromosome bi-allelic markers, and 9 Y-STR loci were analyzed among three Kurdish groups: Zazaki and Kurmanji speakers from Turkey, and Kurmanji speakers from Georgia. When compared with published data from other Kurdish groups and from European, Caucasian, and West and Central Asian ...
متن کاملPercentage of Consonants Correct for 3-5 Years Old Kurdish-Speaking Children With Middle Kurmanji-Mukryani Dialect
Objectives: The present research aims to study the normal development of Percentage of Consonant Correct (PCC) in Kurdish-speaking children, with Middle Kurmanji-Mukryani Dialect as an Articulation Competency Index (ACI). PCC was examined in terms of the manner of articulation and position of sound in the word. Methods: In this descriptoanalytical cross-sectional study, 120 Kurdish-speak...
متن کاملStemming for Kurdish Information Retrieval
Resource scarcity along with diversity –in both dialect and script– are the two primary challenges in Kurdish language processing. In this paper we aim at addressing these two problems by building stemmers for the two main dialects of the Kurdish language (i.e. Sorani and Kurmanji) and investigate their effectiveness on Kurdish Information Retrieval. More specifically, we build Jedar, the first...
متن کامل