Adjacency Matrix Based Full-Text Indexing Models
نویسندگان
چکیده
With the rapid growth of online text information and user accesses, query-processing efficiency has become the major bottleneck of information retrieval (IR) systems. This paper proposes two new full-text indexing models to improve query-processing efficiency of IR systems. By using directed graph to represent text string, the adjacency matrix of text string is introduced. Two approaches are proposed to implement the adjacency matrix of text string, which leads to two new full-text indexing models, i.e., adjacency matrix based inverted file and adjacency matrix based PAT array. Query algorithms for the new models are developed and performance comparisons between the new models and the traditional models are carried out. Experiments over real-world text collections are conducted to validate the effectiveness and efficiency of the new models. The new models can improve query-processing efficiency considerably at the cost of much less amount of extra storage overhead compared to the size of original text database, so are suitable for applications of large-scale text databases.
منابع مشابه
A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملEigen-Image Based Video Segmentation and Indexing
We present a new approach for automatic video scene segmentation and content based indexing. Our approach detects video shots and builds a collection of key frames and representative frames. Scene segmentation and video indexing are based on a temporally windowed principal component analysis of a subsampled version of the video sequence. Two discriminants are derived from the principal componen...
متن کاملمدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی
Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...
متن کاملUsing Deep Learning For Title-Based Semantic Subject Indexing To Reach Competitive Performance to Full-Text
For (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competiti...
متن کاملتأملاتی بر نمایه سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه
Purpose: This paper presents various image indexing techniques and discusses their advantages and limitations. Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...
متن کامل