Efficient integration of data mining techniques in DBMSs

نویسندگان

  • Fadila Bentayeb
  • Jérôme Darmont
  • Cédric Udréa
چکیده

We propose in this paper a new approach for applying data mining algorithms, and more particularly supervised machine learning algorithms, to large databases, in acceptable response times. This goal is achieved by integrating these algorithms within a Database Management System. We are thus only limited by disk capacity, and not by available main memory. However, the disk accesses that are necessary to scan the database induce long response times. Hence, we propose in this paper an original method to reduce the size of the learning set by building its contingency table. The machine learning algorithms are then adapted to operate on this contingency table. In order to validate our approach, we implemented the ID3 decision tree construction method and showed that using the contingency table helped us obtaining response times equivalent to those of classical, in-memory software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ATLaS: A Native Extension of SQL for Data Mining

A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support data mining applications. Thus, there is a pressing need for more general mechanisms for extending DBMSs to support efficiently database-centric data mining appliacations. To satisfy this need, we propose a new extensibility mechanism for SQL-compliant D...

متن کامل

A System Architecture for Database Mining Applications

The problem of enhancing a database management system(DBMS) to support mining applications is twofold. First DBMSs of today have limited functionality for supporting mining applications. Second scaling traditional knowledge discovery techniques for large data sets is not straight forward. Our goal is to propose a system architecture for future DBMSs that incorporate interactive modules for data...

متن کامل

Applying Data Mining Techniques in Property/Casualty Insurance

This paper addresses the issues and techniques for Property/Casualty actuaries using data mining techniques. Data mining means the efficient discovery of previously unknown patterns in large databases. It is an interactive information discovery process that includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the ...

متن کامل

ATLaS: A Native Extension of SQL for Data Mining and Stream Computations

A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support new application domains. Considerable efforts by database researchers and commercial DBMS vendors have led to major extensions; yet there remain important applications—particularly data mining—that are not supported well in SQL-3. Thus, there is a pressi...

متن کامل

Automated detection of coronavirus disease (COVID-19) by using data-mining techniques: a brief report

Background: The clinical field has vast sick data that has not been analyzed. Discovering a way to analyze this raw data and turn it into an information treasure can save many lives. Using data mining methods is an efficient way to analyze this large amount of raw data. It can predict the future with accurate knowledge of the past, providing new insights into disease diagnosis and prevention. S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004