Paragraph-based nearest neighbour searching in full-text documents
نویسندگان
چکیده
This paper discusses the searching of full-text documents to identify paragraphs that are relevant to a user request. Given a natural language query statement, a nearest neighbour search involves ranking the paragraphs comprising a full-text document in order of descending similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. This approach is compared with the more conventional Boolean search which requires the user to specify the logical relationships between the query terms. Comparative searches using 130 queries and 20 full-text documents demonstrate the general effectiveness of the nearest neighbour model for paragraph-based searching. It is shown that the output from a nearest neighbour search can be used to guide a reader to the most appropriate segment of an online full-text document.
منابع مشابه
On Document Classification with Self-Organising Maps
This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those of k nearest neighbour searching and k-means clu...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملNearest-neighbour Searching in Files of Text Signatures Using Transputer Networks
This paper discusses the implementation of nearest-neighbour document retrieval in serial files using transputer networks. The system uses a two-stage retrieval algorithm in which an initial text-signature search is used to exclude large numbers of documents from the detailed and time-consuming pattern-matching search. The latter is implemented using a processor farm, so that documents which ma...
متن کاملNatural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers
N grams o er fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classi er. 91% accuracy is found on binary classi cation on short multi-author technical English documents. This falls if more cat...
متن کامل