Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents
نویسندگان
چکیده
We have put together a corpus of 242 abstracts of Arabic has been stimulated by the D.O.D. Tipster project (Hardocuments using the Proceedings of the Saudi Arabian man, 1993). Arabic provides a very different context National Conferences as a source. All these abstracts from English, since it is a non-Indo-European language involve computer science and information systems. We with a complex morphological structure. also designed and built an automatic information reInvestigation of methods of automatic information retrieval system from scratch to handle Arabic data. The system was implemented in the C language using the GCC trieval for Arabic is essential to the growth of learning compiler and runs on IBM/PCs and compatible microcomin the Arab world. Expansion of information retrieval puters. We have implemented both automatic and manual systems is the simplest and most cost-effective way to indexing techniques for this corpus. A long series of experimake the resources of large reference libraries available ments using measures of recall and precision has demonto the increasing numbers of students and researchers in strated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since the Arab world. automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with 1.1. Automatic Indexing manual indexing. We have also compared the retrieval results using words as index terms versus stems and roots, In the United States the large bibliographic database and confirmed the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more maintained by the National Library of Medicine is ineffective than word indexing. dexed by hand using the MESH vocabulary (Salton & McGill, 1983). Two large legal databases, Westlaw (maintained by West Publishing Company) and Lexis
منابع مشابه
Barq: distributed multilingual internet search engine with focus on Arabic language
♣ This work was supported financially by Alakhawayn University in Ifrane, Morocco under R&D Grant RPF1/2001 and by CoreSoft SARL. * 0-7803-7952-7/03/$17.00 2003 IEEE. Abstract Barq is a distributed multilingual search engine with focus on the Arabic language. The Barq R&D project has involved, over a period of some two years, work on Arabic language processing, Arabic word root extraction, in...
متن کاملJournal of Emerging Trends in Computing and Information Sciences::Automatic Learning Context Tool for Effective Personal Document Indexing and Retrieval
Managing digital documents has become a time consuming process due to sheer scale. Most users manage their personal documents by creating logical hierarchical folder structures. This logical structure depends on the user’s assessment of the context of the document. Basic file structuring has not been changed for decades and hierarchical file structure remains the same. But there has been a surg...
متن کاملEvaluation of Different Query Expansion Techniques by using Different Similarity Measures in Arabic Documents
Millions of users search daily for their needs using internet and other information stores, they search by writing their queries. Unfortunately, these queries may fail to reach to their needs, this fail known as word mismatch. One way of handling this Word mismatch is by using a thesaurus, that shows (usually semantic) the relationships between terms. The main goal of this study is to design an...
متن کاملExemplary documents: a foundation for information retrieval design
Documents are generally represented for retrieval by either extracting index terms from them or by creating and selecting from an external set of candidate terms. There are many procedures for doing this, but while work continues along these dimensions, there have been relatively few attempts to change this basic process. Of particular importance is the creation of indexing schemes for retrieva...
متن کاملA Semi-Automatic Approach of old Arabic Documents Indexing
indexing is a largely used technique in retrieval systems. It has as goal to extract and to represent the meaning of a document so that it can be found by the user. We can cite two types of indexing: manual indexing, and automatic indexing. The automatic indexing requires to use character and words recognition engines which work only over the texts of contemporary documents. In this paper, we p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIS
دوره 48 شماره
صفحات -
تاریخ انتشار 1997