Designing a Distributed search engine for Farsi/English web pages

نویسندگان

M. Azadnia

M. Salehi

چکیده

In this paper we have tried to model, design and test a prototype of Farsi/English search engine. The engine has the duty of covering the web media features such as heterogeneity, volatility and huge amount of unstructured worldwide information. These features as well as the rapid advance in technology, challenge the effectiveness of classical Information Retrieval (IR) techniques. Although a growing number of sites with Farsi language support exist, still few research works have been done regarding the computational linguistic approach to this language, particularly in the field of thesaurus construction and stemming. In this paper, we have tried to utilize the past research experiences to design Farsi/English search engine. It seems that Unicode is sufficiently capable of preparing a conclusive environment within this respect specially regarding the issues of indexing and searching web pages. Many common Farsi code-pages are now being used which are to be converted into Unicode, in order to cover most of the existing Farsi web pages. To handle the complexity in analysis and design of the system and generate a visual easy-toscale model Unified Modeling Language (UML) was utilized; and to assure scalability, distributed functionality and reliability, we tried to use some successful industrial solutions: Relational Database in managing the web-page indices and Clustering techniques to balance the high workload on the user interface and index management unit. We’ve chosen Common Object Request Broker Architecture (CORBA) due to distributed object-oriented design of the system and our agent-oriented trends in future. We’ve tried to apply CORBA design ideas to build a scalable and platform-independent framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Web pages search engine based on DNS

Search engine is main access to the largest information source in this world, Internet. Now Internet is changing every aspect of our life. Information retrieval service may be its most important services. But for common user, internet search service is still far from our expectation, too many unrelated search results, old information, etc. To solve these problems, a new system, search engine ba...

متن کامل

ارزیابی وب گاه دانشگاه علوم پزشکی تهران براساس معیارهای وب سنجی در سال 2008

Background And Aim: Nowadays university websites are very important in information services. There fore university has designed website for categorizing and availability of mass of information . This study accomplish to purpose evaluated of Tehran university of medicine sciences website base on webometrics criteria on 2008 . Materials and Methods: This survey have been used link analysis metho...

متن کامل

Integrating Query Translation and Document Translation in a Cross-language Information Retrieval System

Due to the explosive growth of the WWW, very large multilingual textual resources have motivated the researches in Cross-Language Information Retrieval and online Web Machine Translation. In this paper, the integration of language translation and text processing system is proposed to build a multilingual information system. A distributed English-Chinese system on WWW is introduced to illustrate...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Designing a Distributed search engine for Farsi/English web pages

نویسندگان

چکیده

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

Web pages search engine based on DNS

ارزیابی وب گاه دانشگاه علوم پزشکی تهران براساس معیارهای وب سنجی در سال 2008

Integrating Query Translation and Document Translation in a Cross-language Information Retrieval System

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

عنوان ژورنال:

اشتراک گذاری