Vertical Mining with Incomplete Data

نویسندگان

  • FARIS ALQADAH
  • ZHEN HU
  • LAWRENCE J. MAZLACK
چکیده

Mining frequent patterns is essential in many data mining methods. Frequent patterns lead to the discovery of association rules, strong rules, sequential episodes, and multi-dimensional patterns. Patterns should be discovered in a time and space efficient manner. Vertical mining algorithms key advantage is that they can outperform their horizontal counterparts in terms of both time and space efficiency. Little work has addressed how incomplete data influences vertical data mining. Therefore, the quality and utility of vertical mining algorithms results remains uncertain as real data sets often contain incomplete data. This paper considers establishing methodologies that deal with incomplete data in vertical mining. Key-Words: incomplete data, vertical, data mining, efficiency, privacy preserving, data sensitivity 1 Overview and Objectives Mining frequent patterns is one of the essentials in many data mining applications. Frequent patterns lead to the discovery of association rules, strong rules, sequential episodes, and multi-dimensional patterns. All of these applications play a critical role in allowing corporate and scientific institutions to further understand and analyze the data that they have gathered. In today’s dynamic world it is essential for these patterns to be discovered in both a time and space efficient manner. The authentic value of these discovered patterns derives from the fact that they accurately describe trends in the data and do not simply reflect noise or chance encounters. Vertical mining algorithms have been proposed that veer away from the traditional horizontal transactional database format. The key advantage of vertical mining algorithms is that they have been shown to outperform their horizontal counterparts in terms of both time and space efficiency. However, to the best of our knowledge no work has addressed the issue of how incomplete data influences the vertical data mining process. Therefore the quality and utility of the patterns and rules discovered via vertical mining algorithms remains ambiguous for real data sets that contain incomplete data. Therefore, the purpose of this work is to determine several different methodologies that deal with incomplete data in vertical mining. Furthermore we wish to develop strategies for determining the maximal utilization that can be mined from a dataset based on how much and what data is missing. Both vertical mining and incomplete data have been studied extensively separately, no comprehensive study combining both works is available. The long term goal of our work is to efficiently mine incomplete data, and provide quality measures to the user on the results of the mining. This long term goal entails mining any form of data, be it transactional, observational, spatial etc. and any form of data mining. This paper's is focus is restricted to vertical mining techniques in stationary transactional data, We believe this short term goal will significantly contribute towards the long-term goal due to the fact that many data types and techniques have their origins in stationary transac-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Mining Frequent Itemsets from Evidential Databases

Association rule mining (ARM) problem has been extensively tackled in the context of perfect data. However, real applications showed that data are often imperfect (incomplete and/or uncertain) which leads to the need of ARM algorithms that process imperfect databases. In this paper we propose a new algorithm for mining frequent itemsets from evidential databases. We introduce a new structure ca...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

An Api for Transparent Distributed Vertical Data Mining

New data mining tools and algorithms are available for vertical data mining communities for scalable and efficient data mining to discover the hidden nuggets from huge repositories of data. Most of the traditional data mining algorithms do not scale on these huge datasets. This is due to insufficient computational resources, currently available on a single machine for running these applications...

متن کامل

Vertical Data Mining on Very Large Data Sets

Due to the rapid growth of the volume of data that are available, it is of importance and challenge to develop scalable methodologies and frameworks that can be used to perform efficient and effective data mining on large data sets. Vertical data mining strategy aims at addressing the scalability issues by organizing data in vertical layouts and conducting logical operations on vertical partiti...

متن کامل

Ontology for Data Mining and its Application to Mining Incomplete Data

Ontology has recently received considerable attention in the knowledge management community. This article discusses the needs of ontology development for data mining. Based on a domain analysis of knowledge representations in data mining, it proposes a generic structure of ontologies for data mining. Furthermore, this article specifies the unique ontology resources of the subdomain of innovativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008