Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach

نویسندگان

  • Arjun G. Roy
  • Arijit Chatterjee
  • Mohammad K. Hossain
  • William Perrizo
چکیده

With technological advancements, massive amount of data is being collected in various domains. For instance, since the advent of digital image technology and remote sensing imagery (RSI), NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meters resolution. Likewise, consider the Internet, where, growth of social media, blog Web sites , etc. generates exponential amount of textual data on a daily basis. Since clustering of data is time-consuming, much of these data is archived even before proper analysis. In this paper, we propose two novel and extremely fast algorithms called imgFAUST or Fast Attribute-based Unsupervised and Supervised Table Clustering for images and a variation called docFAUST for textual data. Both these algorithms are based on Predicate-Trees which are compressed, lossless and data-mining-ready data structures. Without compromising much on the accuracy, our algorithms are fast and can be effectively used in high-speed image data and document analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Attribute-based Unsupervised and Supervised Table Clustering using P-Trees

Since the advent of digital image technology and remote sensing imagery (RSI), massive amount of image data has been collected worldwide. For example, since 1972, NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meter resolution. Since image clustering is time-consuming, much of this data is archived even before analysis....

متن کامل

Predicate-Tree Based Pretty Good Privacy of Data

Growth of Internet has led to exponential rise in data communication over the World Wide Web. Several applications and entities such as online banking transactions, stock trading, e-commerce Web sites, etc. are at a constant risk of eavesdropping and hacking. Hence, security of data is of prime concern. Recently, vertical data have gained lot of focus because of their significant performance be...

متن کامل

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees

Databases information is not limited to categorical attributes, but also contains much quantitative data. The typical approach of quantitative association rule mining is achieved by using intervals. Such approach involves many interval generations and merges, and requires reconstruction of tree structures, such as hash trees, R-trees, and FP-trees. When intervals change, these tree structures a...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

Retaining Customers Using Clustering and Association Rules in Insurance Industry: A Case Study

This study clusters customers and finds the characteristics of different groups in a life insurance company in order to find a way for prediction of customer behavior based on payment. The approach is to use clustering and association rules based on CRISP-DM methodology in data mining. The researcher could classify customers of each policy in three different clusters, using association rules. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Comput. Meth. in Science and Engineering

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2012