Arabic Probabilistic Context Free Grammar Induction from a Treebank
نویسندگان
چکیده
Linguistic resources are very important to any natural language processing task. Unfortunately, the manual construction of these resources is laborious and time-consuming. The use of annotated corpora as a knowledge database might be a solution to a fast construction of a grammar for a given language. In this paper, we present our method to automatically induce a syntactic grammar from an Arabic annotated corpus (The Penn Arabic TreeBank), a probabilistic context free grammar in our case. To construct our resource, we first induce context free rules from the annotated corpus trees as a first step and then we calculate a specific probability for each induced rule. Finally, we present and discuss the obtained grammar.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملTreebank-Based Probabilistic Phrase Structure Parsing
The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential sy...
متن کاملAutomatic Extraction and Evaluation of Arabic LFG Resources
This paper presents the results of an approach to automatically acquire large-scale, probabilistic Lexical-Functional Grammar (LFG) resources for Arabic from the Penn Arabic Treebank (ATB). Our starting point is the earlier, work of (Tounsi et al., 2009) on automatic LFG f(eature)-structure annotation for Arabic using the ATB. They exploit tree configuration, POS categories, functional tags, lo...
متن کاملProbabilistic Parsing Using Left Corner Language Models
We introduce a novel parser based on a probabilistic version of a left-corner parser. The left-corner strategy is attractive because rule probabilities can be conditioned on both top-down goals and bottom-up derivations. We develop the underlying theory and explain how a grammar can be induced from analyzed data. We show that the left-corner approach provides an advantage over simple top-down p...
متن کاملFrench parsing enhanced with a word clustering method based on a syntactic lexicon
This article evaluates the integration of data extracted from a French syntactic lexicon, the Lexicon-Grammar (Gross, 1994), into a probabilistic parser. We show that by applying clustering methods on verbs of the French Treebank (Abeillé et al., 2003), we obtain accurate performances on French with a parser based on a Probabilistic Context-Free Grammar (Petrov et al., 2006).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Research in Computing Science
دوره 90 شماره
صفحات -
تاریخ انتشار 2015