Arabic Cross-Document NLP for the Hadith and Biography Literature

نویسندگان

  • Fadi A. Zaraket
  • Jad Makhlouta
چکیده

Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents A, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities Na and relations Ra expressed as edges in a graph G = 〈Na, Ra〉. We use the same techniques to extract entities Nb and relations Rb from a separate set of documents B. We use G to disambiguate Nb and Rb and we integrate the resulting entities back into G by annotating the nodes and edges in G with elements from Nb. We apply our approach in an iterative manner. Our results show a significant increase in accuracy from 41% to 93% after applying this cross-document NLP methodology to hadith and biography documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

آفریدگان یا پرورش یافتگان؟بررسی تحلیلی روایت «نحن صنایع الله و الخلق بعد صنایعنا»

One of the Imami hadith required to be investigated is assumed to be a documentation attributed to Imam Mahdi. In this hadith, which is "We are sanāye' (creation/ beneficence/ and selection) of God and then people are creation/ beneficence/ selection of us" there appears to be a complicated and apparently vague phrase. The present study seeks to reach to a complete and proper understanding of...

متن کامل

Western Works and Views On Hadith : Beginnings , Nature , and Impact

This is a brief history of the beginning of the Orientalist studies of hadith, which will shed light on the most prominent works and views Western scholars on hadith, the nature as well as the impact of their outcomes on Muslims and Western worlds. The beginning era of such studies was between 1890 to 1950. In this period, two influential and founding works of Ignatz Goldziher and Josef Schacht...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

Author Identification Based on a Hybrid Feature Set Using Machine Learning and Clustering Techniques

Author identification of a document can be performed using computational or statistical method. In this paper, we try to identify the author of two ancient Arabic religious books dating from the 6th century: The holy Quran and the Hadith. Authorship identification consists in identifying the author of an anonymously document by using some techniques of Natural Language processing (NLP) and Arti...

متن کامل

A Multilingual Datasets Repository of the Hadith Content

Knowledge extraction from unstructured data is a challenging research problem in research domain of Natural Language Processing (NLP). It requires complex NLP tasks like entity extraction and Information Extraction (IE), but one of the most challenging tasks is to extract all the required entities of data in the form of structured format so that data analysis can be applied. Our focus is to exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012