Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Authors

  • R. Boostani Electrical & Computer Department, Shiraz University, Shiraz, Iran.
  • Z. Sedighi Electrical & Computer Department, Shiraz University, Shiraz, Iran.
Abstract:

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised one. To estimate the density distribution of data, Wiebull Mixture Model (WMM) is utilized due to its high flexibility. Another contribution of this study is to propose a new hill and valley seeking algorithm to find the constraints for semi-supervise algorithm. It is assumed that each density peak stands on a cluster center; therefore, neighbor samples of each center are considered as must-link samples while the near centroid samples belonging to different clusters are considered as cannot-link ones. The proposed approach is applied to a standard image dataset (designed for clustering evaluation) along with some UCI datasets. The achieved results on both databases demonstrate the superiority of the proposed method compared to the conventional clustering methods.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

from linguistics to literature: a linguistic approach to the study of linguistic deviations in the turkish divan of shahriar

chapter i provides an overview of structural linguistics and touches upon the saussurean dichotomies with the final goal of exploring their relevance to the stylistic studies of literature. to provide evidence for the singificance of the study, chapter ii deals with the controversial issue of linguistics and literature, and presents opposing views which, at the same time, have been central to t...

15 صفحه اول

Semi-Supervised Clustering with Limited Background Knowledge

In many machine learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest. Our research focus is on semi-supervised clustering, which uses a small amount of supervised data in the...

full text

Extracting Knowledge from Incomplete Data

Decision-makers are often met with situations where optimal decisions have to be made in the presence of missing information. To facilitate such work we propose application of Armstrong axioms. Key–Words: Decision support systems, uncertainty management, inference axioms

full text

Semi-supervised Pattern Learning for Extracting Relations from Bioscience Texts

A variety of pattern-based methods have been exploited to extract biological relations from literatures. Many of them require significant domain-specific knowledge to build the patterns by hand, or a large amount of labeled data to learn the patterns automatically. In this paper, a semisupervised model is presented to combine both unlabeled and labeled data for the pattern learning procedure. F...

full text

A Variational Approach to Semi-Supervised Clustering

We present a variational inference scheme for semi-supervised clustering in which data is supplemented with side information in the form of common labels. There is no mutual exclusion of classes assumption and samples are represented as a combinatorial mixture over multiple clusters. The method has other advantages such as the ability to find the most probable number of soft clusters in the dat...

full text

Semi-supervised incremental clustering of categorical data

Résumé. Le clustering semi-supervisé combine l’apprentissage supervisé and non-supervisé pour produire meilleurs clusterings. Dans la phase initiale supervisée de l’algorithme, un échantillon d’apprentissage est produit par selection aléatoire. On suppose que les exemples de l’échantillon d’apprentissage sont étiquetés par un attribut de classe. Puis, un algorithme incrémentiel développé pour l...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 6  issue 2

pages  287- 295

publication date 2018-07-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023