Disjunctive Learning with a Soft-Clustering Method

نویسندگان

  • Guillaume Cleuziou
  • Lionel Martin
  • Christel Vrain
چکیده

In the case of concept learning from positive and negative examples, it is rarely possible to find a unique discriminating conjunctive rule; in most cases, a disjunctive description is needed. This problem, known as disjunctive learning, is mainly solved by greedy methods, iteratively adding rules until all positive examples are covered. Each rule is determined by discriminating properties, where the discriminating power is computed from the learning set. Each rule defines a subconcept of concept to be learned with these methods. The final set of sub-concepts is then highly dependent from both the learning set and the learning method. In this paper, we propose a different strategy: we first build clusters of similar examples thus defining subconcepts, and then we characterize each cluster by a unique conjunctive definition. The clustering method relies on a similarity measure designed for examples described in first order logic. The main particularity of our clustering method is to build “soft clusters”, i.e. allowing some objects to belong to different groups. Once clusters have been built, we learn first-order rules defining the clusters, using a general-to-specific method: each step consists in adding a literal that covers all examples of a group and rejects as many negative examples as possible. This strategy limits some drawbacks of greedy algorithms and induces a strong reduction of the hypothesis space: for each group (subconcept), the search space is reduced to the set of rules that cover all the examples of the group and reject the negative examples of the concept.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

STATIC AND DYNAMIC OPPOSITION-BASED LEARNING FOR COLLIDING BODIES OPTIMIZATION

Opposition-based learning was first introduced as a solution for machine learning; however, it is being extended to other artificial intelligence and soft computing fields including meta-heuristic optimization. It not only utilizes an estimate of a solution but also enters its counter-part information into the search process. The present work applies such an approach to Colliding Bodies Optimiz...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Nonlinear disjunctive kriging for the estimating and modeling of a vein copper deposit

ABSTRACT Estimation of mineral resources and reserves with low values of error is essential in mineral exploration. The aim of this study is to estimate and model a vein type deposit using disjunctive kriging method. Disjunctive Kriging (DK) as an appropriate nonlinear estimation method has been used for estimation of Cu values. For estimation of Cu values and modelling of the distributio...

متن کامل

Scaling Author Name Disambiguation with CNF Blocking

An author name disambiguation (AND) algorithm identifies a unique author entity record from all similar or same publication records in scholarly or similar databases. Typically, a clustering method is used that requires calculation of similarities between each possible record pair. However, the total number of pairs grows quadratically with the size of the author database making such clustering...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003