Extending ER models to capture database transformations to build data sets for data mining
نویسندگان
چکیده
In a data mining project developed on a relational database, a significant effort is required to build a data set for analysis. The main reason is that, in general, the database has a collection of normalized tables that must be joined, aggregated and transformed in order to build the required data set. Such scenario results in many complex SQL queries that are written independently from each other, in a disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, creating problems in database evolution and software maintenance. In this paper, we classify potential database transformations, we extend an ER diagram with entities capturing database transformations and we introduce an algorithm which automates the creation of such extended ER model. We present a case study with a public database illustrating database transformations to build a data set to compute a typical data mining model.
منابع مشابه
Prediction of global sea cucumber capture production based on the exponential smoothing and ARIMA models
Sea cucumber catch has followed “boom-and-bust” patterns over the period of 60 years from 1950-2010, and sea cucumber fisheries have had important ecological, economic and societal roles. However, sea cucumber fisheries have not been explored systematically, especially in terms of catch change trends. Sea cucumbers are relatively sedentary species. An attempt was made to explore whether the tim...
متن کاملHeteroClass: A Framework for Effective Classification from Heterogeneous Databases
Classification is an important data mining task and it has been studied from different perspectives. Recently multi-relational classification algorithms has been studied due to many real-world applications. However, current work has generally assumed that all the needed data to build an accurate prediction model resides in a single database. Many practical settings, however, require that we com...
متن کاملExtended SQL Aggregation for Database Transformation
To prepare a normalized data set from relational database for analysis requires significant efforts and it is time consuming task. The main reason is that, in general the database grows with many tables and views that must be joined, aggregated and transformed in order to build the required data set. As result, most of the SQL queries are written independently multiple times and in disorganize ...
متن کاملPrediction of global sea cucumber capture production based on the exponential smoothing and ARIMA models
Sea cucumber catch has followed “boom-and-bust” patterns over the period of 60 years from 1950-2010, and sea cucumber fisheries have had important ecological, economic and societal roles. However, sea cucumber fisheries have not been explored systematically, especially in terms of catch change trends. Sea cucumbers are relatively sedentary species. An attempt was made to explore whe...
متن کاملA Three-phase Hybrid Times Series Modeling Framework for Improved Hospital Inventory Demand Forecast
Background and Objectives: Efficient cost management in hospitals’ pharmaceutical inventories have the potential to remarkably contribute to optimization of overall hospital expenditures. To this end, reliable forecasting models for accurate prediction of future pharmaceutical demands are instrumental. While the linear methods are frequently used for forecasting purposes chiefly due to their si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Knowl. Eng.
دوره 89 شماره
صفحات -
تاریخ انتشار 2014