Using heterogeneous linguistic knowledge in local coherence identification for information retrieval

نویسنده

  • Samuel W. K. Chan
چکیده

This paper proposes a novel approach to automatic text segmentation without a full semantic understanding. In order to analyse the linguistic bonds and determine the degree of coherence that a text may exhibit, the tremendous diversity of textual relations in a discourse network is represented. A corpus of mutual linguistic knowledge that captures the similarity of meaning and causal relations is encoded in the discourse network, which is then subjected to a cluster algorithm. As a result, segments in the text are segregated into clusters according to their textual similarity. Topic boundaries in a text can be identified by observing the shifts of segments from one cluster to another. The experimental results show that the combination of the heterogeneous knowledge is capable of addressing the topic shifts. Comparison with a related method demonstrates that the algorithm is closely related to the topic boundaries. Given the increasing recognition of text structure in the fields of information retrieval in unpartitioned text, this approach provides a quantitative model and an efficient tool in text segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrated Language Technologies for Multilingual Information Services in the MEMPHIS Project

The MEMPHIS project integrates a large set of NLP technologies. An overview of components, their underlying technologies and resources will be presented: language identification, document classification, linguistic analysis, summarization, information extraction, machine translation, knowledge management and crosslingual retrieval.

متن کامل

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

Linguistic Knowledge: An Intelligent Information Retrieval System

This paper describes the results of some experiments using a new approach to information access that combines techniques from natural language processing and knowledge representation with a penalty based technique for relevance estimation and passage retrieval. Unlike many attempts to combine natural language processing with information retrieval, these results show substantial benefit from usi...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

Transliterated Word Identification and Application to Query Translation Mining

Query translation mining is a key technique in cross-language information retrieval and machine translation knowledge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2000