Full-text Search for Thai Information Retrieval Systems

نویسندگان

THANARUK THEERAMUNKONG

WIRAT CHINNAN

THANASAN TANHERMHONG

VIRACH SORNLERTLAMVANICH

چکیده

While there have been a lot of efficient full-text search algorithms developed for English documents, these algorithms can be directly used for other languages, e.g. Chinese, Japanese, Thai and so on. However, due to idiosyncrasies of each individual language, directly applying such algorithms may not be suitable for the language considered. This paper proposes a simplification of Boyer-Moore algorithm, called BMT, in order to reduce computation and makes it appropriate for Thai full-text. To investigate the efficiency, the comparison of BMT with other search algorithms is evaluated. Moreover, we applied syllable-like segmentation, called Thai character clusters (TCCs), to improve searching efficiency in Thai documents by grouping Thai characters into inseparable units. The TCC is based on the spell features of Thai language. Comparing with traditional full-text searching methods, this approach can improve not only searching time and memory consumption but also searching accuracy. The experimental results evidence that searching methods using TCC outperform the traditional methods in full-text search algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adopting the Information Retrieval Approach for Storing and Retrieving Thai-text Structured Data

This paper describes an approach of using full-text search engine in storing and retrieving structured data in Thai language. It discusses some limitations of database management system (DBMS) in querying Thai full-text based content. These limitations can result in degrading of retrieval performance both in terms of result accuracy and system response time. Information Retrieval (IR) system or...

متن کامل

More Accurate Fuzzy Text Search for Languages Using Abugida Scripts

Text search is a key step in any kind of information access. For doing it effectively, we can use knowledge about the concerned writing systems. Methods based on such knowledge can give significantly better results for searching text, at least for some languages. This can improve information retrieval in particular and information access in general. In this paper, we present a method for fuzzy ...

متن کامل

Overview of the Full-Text Document Retrieval Benchmark

8.1 Introduction For most of recorded history, textual data have existed primarily in hardcopy format, and the related document retrieval process was essentially a manual task, possibly involving the assistance of cross-reference catalogs. By the mid-1960s, work was under way at the University of Pittsburgh to develop computer-assisted legal research systems [Harrington, 1984–85]. Also, during ...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

We describe our experience in developing Web Search Systems using Oracle’s SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the ’web space’ and to provide an efficient search engine for free-text search. The Web enables global access to and maximum informa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Full-text Search for Thai Information Retrieval Systems

نویسندگان

چکیده

منابع مشابه

Adopting the Information Retrieval Approach for Storing and Retrieving Thai-text Structured Data

More Accurate Fuzzy Text Search for Languages Using Abugida Scripts

Overview of the Full-Text Document Retrieval Benchmark

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

عنوان ژورنال:

اشتراک گذاری