Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining
نویسندگان
چکیده
Distributed Data Mining (DDM) is the process of mining distributed and heterogeneous datasets. DDM is widely seen as a means of addressing the scalability issue of mining large data sets. Consequently, there is an emerging focus on optimisation of the DDM process. In this paper we present cost formulae for estimating the communication and computation time for different distributed data mining scenarios.
منابع مشابه
Presented a method for estimating the cost of software using PCA to reduce the size and with the help of data mining
These days, data mining one of the most significant issues. One field data mining is a mixture of computer science and statistics which is considerably limited due to increase in digital data and growth of computational power of computer. One of the domains of data mining is the software cost estimation category. In this article, classifying techniques of learning algorithm of machine ...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملDesign and Analysis of a Dynamic Load Balancing Strategy for Large-Scale Distributed Association Rule Mining
Association rule mining is one of the most important data mining techniques. Algorithms of this technique search a large space, considering numerous different alternatives and scanning the data repeatedly. Parallelism seems to be the natural solution in order to be able to work with industrial-sized databases. Large-scale computing systems, such as Grid computing environments, are recently rega...
متن کاملBu er - Safe Communication Optimization based on Data FlowAnalysis and Performance
This paper presents a novel approach to reduce communication costs of programs for distributed memory machines. Our techniques are based on uni-directional bit-vector data ow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. Our data ow ...
متن کاملBu er - Safe Communication Optimization based on Data
This paper presents a novel approach to reduce communication costs of programs for distributed memory machines. Our techniques are based on uni-directional bit-vector data ow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. Our data ow ...
متن کامل