Diverse reduct subspaces based co-training for partially labeled data

نویسندگان

  • Duoqian Miao
  • Can Gao
  • Nan Zhang
  • Zhifei Zhang
چکیده

Keywords: Rough set theory Markov blanket Attribute reduction Rough co-training Partially labeled data Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partially labeled data is first studied. With a new definition of discernibility matrix, a Markov blanket based heuristic algorithm is put forward to compute the optimal reduct of partially labeled data. A novel rough co-training model is then proposed, which could capitalize on the unlabeled data to improve the performance of rough classifier learned only from few labeled data. The model employs two diverse reducts of partially labeled data to train its base classifiers on the labeled data, and then makes the base classifiers learn from each other on the unlabeled data iteratively. The classifiers constructed in different reduct subspaces could benefit from their diversity on the unlabeled data and significantly improve the performance of the rough co-training model. Finally, the rough co-training model is theoretically analyzed, and the upper bound on its performance improvement is given. The experimental results show that the proposed model outperforms other representative models in terms of accuracy and even compares favorably with rough classifier trained on all training data labeled.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selection of Relevant and Non-Redundant Feature Subspaces for Co-training

On high dimensional data sets choosing subspaces randomly, as in RASCO (Random Subspace Method for Co-training, Wang et al. 2008) algorithm, may produce diverse but inaccurate classifiers for Co-training. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. First algorithm relevant random subspaces (Rel-RASCO) p...

متن کامل

A co-training algorithm for multi-view data with applications in data fusion

In several scientific applications, data are generated from two or more diverse sources (views) with the goal of predicting an outcome of interest. Often it is the case that the outcome is not associated with any single view. However, the synergy of all measurements from each view may yield a more predictive classifier. For example, consider a drug discovery application in which individual mole...

متن کامل

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

Semi-supervised learning reduces the cost of labeling the training data of a supervised learning algorithm through using unlabeled data together with labeled data to improve the performance. Co-Training is a popular semi-supervised learning algorithm, that requires multiple redundant and independent sets of features (views). In many real-world application domains, this requirement can not be sa...

متن کامل

Semi-supervised Subspace Co-Projection for Multi-class Heterogeneous Domain Adaptation

Heterogeneous domain adaptation aims to exploit labeled training data from a source domain for learning prediction models in a target domain under the condition that the two domains have different input feature representation spaces. In this paper, we propose a novel semi-supervised subspace co-projection method to address multiclass heterogeneous domain adaptation. The proposed method projects...

متن کامل

Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing

This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training. Instead of only using 1best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple 1-best parse trees generated from diverse parsers on unlabeled data. With a conditional rand...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Approx. Reasoning

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2011