Similarities between Arabic dialects: Investigating geographical proximity
نویسندگان
چکیده
The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines based on increasingly limited geographic areas like cities and provinces. This paper focuses a related yet relatively unexplored topic: the effects geographical proximity located Arab countries their dialectical similarity. Our twofold, reliant on: 1) comparing textual similarities between using cosine similarity 2) measuring distance locations. We study MADAR NADI, two established datasets with from many results indicate different may fact have more than within same country, depending proximity. correlation city suggests are closer together likely to share attributes, regardless country borders. nuance provides potential for important advancements dialect because it indicates granular approach essential understanding how frame problem identification.
منابع مشابه
Parsing Arabic Dialects
The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem ...
متن کاملVariation in polar interrogative contours within and between Arabic dialects
Quantitative analysis of fundamental frequency (F0) contours in yes/no-questions and coordinated questions, are compared across eight Arabic dialects, based on scripted role play data from the Intonational Variation in Arabic corpus [1]. Visualisation of the F0 contour of all tokens is used to evaluate how consistently speakers produce a typical contour in each dialect, for each question type. ...
متن کاملMachine Translation of Arabic Dialects
Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build LevantineEnglish and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon’s Mechanical Tur...
متن کاملAutomatic Identification of Arabic Dialects
In this work, automatic recognition of Arabic dialects is proposed. An acoustic survey of the proportion of vocalic intervals and the standard deviation of consonantal intervals in nine dialects (Tunisia, Morocco, Algeria, Egypt, Syria, Lebanon, Yemen, Golf’s Countries and Iraq) is performed using the platform Alize and Gaussian Mixture Models (GMM). The results show the complexity of the autom...
متن کاملMorphological Analysis and Generation for Arabic Dialects
We present MAGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. MAGEAD provides an analysis to a root+pattern representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing and Management
سال: 2022
ISSN: ['0306-4573', '1873-5371']
DOI: https://doi.org/10.1016/j.ipm.2021.102770