Explicit Fine grained Syntactic and Semantic Annotation of the Idafa Construction in Arabic
نویسندگان
چکیده
Idafa in traditional Arabic grammar is an umbrella construction that covers several phenomena including what is expressed in English as noun-noun compounds and Saxon & Norman genitives. Additionally, Idafa participates in some other constructions, such as quantifiers, quasi-prepositions, and adjectives. Identifying the various types of the Idafa construction (IC) is of importance to Natural Language Processing (NLP) applications. Noun-Noun compounds exhibit special behaviour in most languages impacting their semantic interpretation. Hence distinguishing them could have an impact on downstream NLP applications. The most comprehensive computational syntactic representation of the Arabic language is found in the LDC Arabic Treebank (ATB). Despite its coverage, ICs are not explicitly labeled in the ATB and furthermore, there is no clear distinction between ICs of noun-noun relations and other traditional ICs. Hence, we devise a detailed syntactic and semantic typification process of the IC phenomenon in Arabic. We target the ATB as a platform for this classification. We render the ATB annotated with explicit IC labels in addition to further semantic characterization which is useful for syntactic, semantic and cross language processing. Our typification of IC comprises 3 main syntactic IC types: False Idafas (FIC), Grammatical Idafas (GIC), and True Idafas (TIC), which are further divided into 10 syntactic subclasses. The TIC group is further classified into semantic relations. We devise a method for automatic IC labeling and compare its yield against the CATiB Treebank. Our evaluation shows that we achieve the same level of accuracy, but with the additional finegrained classification into the various syntactic and semantic types.
منابع مشابه
A syntactic-semantic analysis of \"منصوب به نزع خافض\"based on the Holy Quran
One of important issues in the field of implication and aggression is "منصوب به نزع خافض". It is an idiom related to مفعول به "”. By referring to its definition, a syntactic-semantic analysis will be done in this paper. It tries to indicate what is the relationship between word and meaning and to what extent Arabic syntax focu...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملTopicalization in English Translation of the Holy Quran: A Comparative Study
The Holy Quran, as an Arabic masterpiece, comprises great domains of syntactical, phonological, and semantic literary patterns. These patterns work as the shackle of translators. This study examined the application of the most common shift strategies in Catford‟s linguistic model for translation of topicalization in chapter 29 of the Holy Quran. The topicalized cases were compared to their coun...
متن کاملFine-grained Arabic named entity recognition
Named Entity Recognition (NER) is a Natural Language Processing (NLP) task, which aims to extract useful information from unstructured textual data by detecting and classifying Named Entity (NE) phrases into predefined semantic classes. This thesis addresses the problem of fine-grained NER for Arabic, which poses unique linguistic challenges to NER; such as the absence of capitalisation and sho...
متن کامل