A Survey of Automatic Indexing Techniques for Thai Text Documents
نویسنده
چکیده
* Faculty of Information Technology, Rangsit University. Abstract With the rapidly increasing number of Thai text documents available in digital media and websites, it is important to find an efficient text indexing technique to facilitate search and retrieval. An efficient index would speed up the response time and improve the accessibility of the documents. Up to now, not much research in Thai text indexing has been conducted as compared to more commonly used languages like English or other European languages. In Thai text indexing, the extraction of indexing terms becomes a main issue because they cannot be specified automatically from text documents, due to the nature of Thai texts being non-segmented. As a result, there are many challenges for indexing Thai text documents. The ma-jority of Thai text indexing techniques can be divided into two main categories: a language-dependent technique and a lan-guage-independent technique as will be described in this paper.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملAn Enhancement of Thai Text Retrieval Efficiency by Automatic Backward Transliteration
Loan words, which are borrowed from foreign languages, are used in many languages such as Japanese, Chinese, Korean and Thai. They have effects on Thai Text Retrieval (TTR) system leading to inaccurate terms weight for indexing and text clustering. Therefore, there is a need to create automatic backward transliteration that can solve this problem. In this paper, we propose a hybrid model approa...
متن کاملمدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی
Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...
متن کاملA Survey of Indexing and Retrieval of Multimodal Documents: Text and Images
A document conveys information using multiple modalities, including text, layout/style and images. For example, journal articles usually have figures to illustrate experimental results, and the title in a journal article usually has a different font size than the body text. Indexing and retrieval using only text is the traditional way of IR (Information Retrieval). With the development of the I...
متن کاملSegmentation of Thai Handwritten Text for Automatic Document Retrieval
There is a huge amount of documents in Thai government organizations. Although automatic document image retrieval systems in English have been proposed and developed, there are no specific system which is capable to retrieve relevant information from documents in Thai language. While matching words or optical character recognition (OCR) can be applied, segmentation of the words and characters i...
متن کامل