Rank-based strategies for cleaning inconsistent spatial databases
نویسندگان
چکیده
A spatial dataset is consistent if it satisfies a set of integrity constraints. Although consistency is a desirable property of databases, enforcing the satisfaction of integrity constraints might not be always feasible. In such cases the presence of inconsistent data may have a negative effect on the results of data analysis and processing and, in consequence, there is an important need for data-cleaning tools to detect and remove, if possible, inconsistencies in large datasets. This work proposes strategies to support data cleaning of spatial databases with respect to a set of integrity constraints that impose topological relations between spatial objects. The basic idea is to rank the geometries in a spatial dataset that should be modified in order to improve the quality of the data (in terms of consistency). An experimental evaluation validates the proposal and shows that the order in which geometries are modified affects both the overall quality of the database and the final number of geometries to be processed to restore consistency.
منابع مشابه
Exploiting Conflict Structures in Inconsistent Databases
Given an inconsistent database that violates a set of (conditional) functional dependencies, a minimal attribute-based repair is a database that satisfies the dependencies, and minimally differs from the original database in the set of attribute values that have been changed. For an inconsistent database, we define a basic conflict as a minimal set of attribute values, of which at least one nee...
متن کاملSpatial Design for Knot Selection in Knot-Based Low-Rank Models
Analysis of large geostatistical data sets, usually, entail the expensive matrix computations. This problem creates challenges in implementing statistical inferences of traditional Bayesian models. In addition,researchers often face with multiple spatial data sets with complex spatial dependence structures that their analysis is difficult. This is a problem for MCMC sampling algorith...
متن کاملQuantitative Data Cleaning for Large Databases
Data collection has become a ubiquitous function of large organizations – not only for record keeping, but to support a variety of data analysis tasks that are critical to the organizational mission. Data analysis typically drives decision-making processes and efficiency optimizations, and in an increasing number of settings is the raison d’etre of entire agencies or firms. Despite the importan...
متن کاملBayesian Data Cleaning for Web Data
Data Cleaning is a long standing problem, which is growing in importance with the mass of uncurated web data. State of the art approaches for handling inconsistent data are systems that learn and use conditional functional dependencies (CFDs) to rectify data. These methods learn data patterns–CFDs–from a clean sample of the data and use them to rectify the dirty/inconsistent data. While getting...
متن کاملPreference-Driven Querying of Inconsistent Relational Databases
One of the goals of cleaning an inconsistent database is to remove conflicts between tuples. Typically, the user specifies how the conflicts should be resolved. Sometimes this specification is incomplete, and the cleaned database may still be inconsistent. At the same time, data cleaning is a rather drastic approach to conflict resolution: It removes tuples from the database, which may lead to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International Journal of Geographical Information Science
دوره 29 شماره
صفحات -
تاریخ انتشار 2015