Interactive Data Mining by Using Multidimensional Scaling
نویسندگان
چکیده
Blind choice and parameterization of data mining tools often yield vague or completely misleading results. Interactive visualization enables not only extensive exploration of data but also better matching of clustering/classification schemes to the type of data being analyzed. The multidimensional scaling (MDS), which employs particle dynamics to the error function minimization, is a good candidate to be a computational engine for interactive data mining. However, the main disadvantage of MDS is both its memory and time complexity. We developed novel SUBSET algorithm of a lower complexity, which is competitive to the best, currently used, MDS algorithms in terms of efficiency and accuracy. SUBSET employs reduced dissimilarity matrix, which structure allows for efficient usage of both multi-core CPU and SIMD GPU processor architectures. Consequently, SUBSET enables visualization of datasets consisting of an order of 10 data items on a standard personal computer or laptop. We compare a few strategies of dissimilarity matrix reduction and we present typical timings obtained by respective MDS algorithms on selected multithread CPU and GPU architectures. © 2013 The Authors. Published by Elsevier B.V. Selection and/or peer-review under responsibility of the organizers of the 2013 International Conference on Computational Science
منابع مشابه
Interactive Metric Learning-Based Visual Data Exploration: Application to the Visualization of a Scientific Social Network
Data visualization is a core approach for understanding data specifics and extracting useful information in a simple and intuitive way. Visual data mining proceeds by projecting multidimensional data onto two-dimensional (2D) or three-dimensional (3D) data, e.g., through mathematical optimization and topology preserved in multidimensional scaling (MDS). However, this projection does not necessa...
متن کاملUsing of Two Analyzing Methods Multidimensional Scaling and Hierarchical Cluster for Pattern Recognition via Data Mining
The present study aims at making comparison between two analyzing methods , multidimensional scaling and hierarchical clustering methods ,on the other hand, to imply data mining by using of these two analyzing methods to classify and discriminate twenty five samples of technician pieces (Prehistoric goblets) through the study of Engineering shapes and special figures for every sample separately...
متن کاملInteractive Visualization and Navigation in Large Data Collections using the Hyperbolic Space
We propose the combination of two recently introduced methods for the interactive visual data mining of large collections of data. Both, Hyperbolic Multi-Dimensional Scaling (HMDS) and Hyperbolic Self-Organizing Maps (HSOM) employ the extraordinary advantages of the hyperbolic plane (H2): (i) the underlying space grows exponentially with its radius around each point ideal for embedding high-dim...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملVisualization of large data sets using MDS combined with LVQ
A common task in data mining is the visualization of multivariate objects using various methods, allowing human observers to perceive subtle inter-relations in the dataset. Multidimensional scaling (MDS) is a well known technique used for this purpose, but it due to its computational complexity there are limitations on the number of objects that can be displayed. Combining MDS with a clustering...
متن کامل