Indexing Incomplete Databases
نویسندگان
چکیده
Incomplete databases, that is, databases that are missing data, are present in many research domains. It is important to derive techniques to access these databases efficiently. We first show that known indexing techniques for multi-dimensional data search break down in terms of performance when indexed attributes contain missing data. This paper utilizes two popularly employed indexing techniques, bitmaps and quantization, to correctly and efficiently answer queries in the presence of missing data. Query execution and interval evaluation are formalized for the indexing structures based on whether missing data is considered to be a query match or not. The performance of Bitmap indexes and quantization based indexes is evaluated and compared over a variety of analysis parameters for real and synthetic data sets. Insights into the conditions for which to use each technique are provided.
منابع مشابه
Fast High-Dimensional Data Search in Incomplete Databases
We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the search keys may contain missing attribute values. The first is a multi-dimensional index structure, called the Bitstring-augmented R-tree (BR-tree), whereas the second comprises a family of multiple one-dimensional one...
متن کاملIncomplete evidence: the inadequacy of databases in tracing published adverse drug reactions in clinical trials
BACKGROUND We would expect information on adverse drug reactions in randomised clinical trials to be easily retrievable from specific searches of electronic databases. However, complete retrieval of such information may not be straightforward, for two reasons. First, not all clinical drug trials provide data on the frequency of adverse effects. Secondly, not all electronic records of trials inc...
متن کاملSurvey on Various Methods and Techniques for Searching Dimension in Incomplete Database
Now a days, dimension incomplete problem is fundamental research problem in multidimensional database. Information regarding the missing dimension posses great computational challenges. In multidimensional database similarity query problem occur with numerous application in database area such as, data mining, information retrieval etc. Due to various practical issues like remote data accessing ...
متن کاملمقایسه ساختار اصطلاح نامههای پایگاههای اطلاعاتی Pubmed و Embase با استاندارد اصطلاحنامه نویسی سازمان ملی استانداردهای اطلاعاتی آمریکا و بررسی شیوههای نمایه سازی دو پایگاه
Introduction: According to mortality rates in Iran, cardiovascular diseases, neoplasms, perinatal mortality, and respiratory tract diseases were top rate mortality in 2003(1382). To reduce mortality rate, Iranian medical community need to know more about recent therapeutic regimens. Two main medical databases are Pubmed and Embase. Researching Pubmed and Embase indexing methods and comparing Me...
متن کاملوضعیت بازیابی اطلاعات در دو پایگاه نمایه و نما و سنجش اثربخشی استفاده از واژگان کنترل شده در نمایهسازی این دو پایگاه
Purpose: This study was carried out to determine the level of precision, recall, and searching time for “Nama” and “Namayeh” databases, as well as to find out which of the indexing tools (thesaurus and Dewey decimal classification) helps us more in improvement of information retrieval. Methodology: This study is an analytical survey in which the necessary data was collected by direct observati...
متن کامل