Theoretical Comparisons of Learning from Positive-Negative, Positive-Unlabeled, and Negative-Unlabeled Data
نویسندگان
چکیده
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning. Our theoretical findings well agree with the experimental results on artificial and benchmark data even when the experimental setup does not match the theoretical assumptions exactly.
منابع مشابه
Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning...
متن کاملMulti-Graph Learning with Positive and Unlabeled Bags
In this paper, we formulate a new multi-graph learning task with only positive and unlabeled bags, where labels are only available for bags but not for individual graphs inside the bag. This problem setting raises significant challenges because bag-of-graph setting does not have features to directly represent graph data, and no negative bags exits for deriving discriminative classification mode...
متن کاملPositive unlabeled learning via wrapper-based adaptive sampling
Learning from positive and unlabeled data frequently occurs in applications where only a subset of positive instances is available while the rest of the data are unlabeled. In such scenarios, often the goal is to create a discriminant model that can accurately classify both positive and negative data by modelling from labeled and unlabeled instances. In this study, we propose an adaptive sampli...
متن کاملLearning to Rank Biomedical Documents with only Positive and Unlabeled Examples: A Case Study
In the text mining field, obtaining training data requires human experts' labeling efforts, which is often time consuming and expensive. Supervised learning with only a small number of positive examples and a large amount of unlabeled data, which is easy to get, has attracted booming interests in the field. A recently proposed relabeling method, which assumes unlabeled data as negative data for...
متن کاملCool Blog Classification from Positive and Unlabeled Examples
We address the problem of cool blog classification using only positive and unlabeled examples. We propose an algorithm, called PUB, that exploits the information of unlabeled data together with the positive examples to predict whether the unseen blogs are cool or not. The algorithm uses the weighting technique to assign a weight to each unlabeled example which is assumed to be negative in the t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.03130 شماره
صفحات -
تاریخ انتشار 2016