Arabic Tweeps Gender and Dialect Prediction
نویسندگان
چکیده
In this paper, we present our approach for author profiling task based on Arabic content (Twitter case), which was one of the tasks required in PAN at CLEF 2017. Author profiling is the process of identifying authors’ traits, which constitute the profile of an author, by analysing his/her writings. In our research, we considered the gender and the variety (dialect) of an author as two important traits that have many useful applications in the domain of Arabic social media analysis. For this purpose, several feature vectors and classifiers were tried to reach to the best prediction models for these two traits.
منابع مشابه
Borrowing the Verb “ast” and Its Varieties in Arabic Dialect of Sarab
“Borrowing” is a lingual process that is studied in diachronic linguistics. In this process a language borrows elements from another language. This process usually occurs in areas that two languages make contact with each other. In a dialect spoken in South Khorasan the language borrowing happens. Arabs living in this part of Iran probably have immigrated in the early centuries of Islam. In thi...
متن کاملThe Status of [h] and [ʔ] in the Sistani Dialect of Miyankangi
The purpose of this article is to determine the phonemic status of [h] and [ʔ] in the Sistani dialect of Miyankangi. Auditory tests applied to the relevant data show that [ʔ] occurs mainly in word-initial position, where it stands in free variation with Ø. The only place where [h] is heard is in Arabic and Persian loanwords, and only in the pronunciation of some speakers who are educated and/or...
متن کاملGender Inference for Arabic Language in Social Media
The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related t...
متن کاملA Study of Inflectional Categories of Noun in Sistani Dialect
The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...
متن کاملBilinguality vs. Monolinguality among Kalhuri Kurdish Speakers: Gender, Social Class and English Language Achievement
Today in multilingual contexts, many parents prefer to rear their children in the dominant language rather than in their mother tongue. This phenomenon is widespread among native speakers of Kalhuri dialect of the Kurdish language in the multilingual context of Iran, too. Nevertheless, some studies have evidenced the privilege of bilinguals in learning an additional language though some others ...
متن کامل