Positive-Unlabeled Learning with Non-Negative Risk Estimator

نویسندگان

  • Ryuichi Kiryo
  • Gang Niu
  • Marthinus Christoffel du Plessis
  • Masashi Sugiyama
چکیده

From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convex Formulation for Learning from Positive and Unlabeled Data

We discuss binary classification from only positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such a...

متن کامل

On Presentation a new Estimator for Estimating of Population Mean in the Presence of Measurement error and non-Response

Introduction According to the classic sampling theory, errors that are mainly considered in the estimations are sampling errors.  However, most non-sampling errors are more effective than sampling errors in properties of estimators. This has been confirmed by researchers over the past two decades, especially in relation to non-response errors that are one of the most fundamental non-immolation...

متن کامل

Positive-unlabeled learning for disease gene identification

BACKGROUND Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not...

متن کامل

Analysis of Learning from Positive and Unlabeled Data

Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundar...

متن کامل

Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

Capturing input-output dependency is an important task in statistical data analysis. Mutual information (MI) is a vital tool for this purpose, but it is known to be sensitive to outliers. To cope with this problem, a squared-loss variant of MI (SMI) was proposed, and its supervised estimator has been developed. On the other hand, in real-world classification problems, it is conceivable that onl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017