A Data Cleansing Method for Clustering Large-Scale Transaction Databases
نویسندگان
چکیده
In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.
منابع مشابه
ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases
We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering ...
متن کاملAn Effective Clustering Algorithm for Transaction Databases Based on K-Mean
Clustering is an important technique in machine learning, which has been successfully applied in many applications such as text and webpage classifications, but less in transaction database classification. A large organization usually has many branches and accumulates a huge amount of data in their branch databases called multidatabases. At present, the best way of mining multidatabases is, fir...
متن کاملConceptual Clustering of Heterogeneous Distributed Databases
With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...
متن کاملA hybrid approach for database intrusion detection at transaction and inter-transaction levels
Nowadays, information plays an important role in organizations. Sensitive information is often stored in databases. Traditional mechanisms such as encryption, access control, and authentication cannot provide a high level of confidence. Therefore, the existence of Intrusion Detection Systems in databases is necessary. In this paper, we propose an intrusion detection system for detecting attacks...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Transactions
دوره 93-D شماره
صفحات -
تاریخ انتشار 2010