β-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
نویسندگان
چکیده
During the past few years, the machine learning community has paid attention to developing new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce the β-risk as a generalized formulation of the standard empirical risk based on surrogate margin-based loss functions. This risk allows us to express the reliability on the labels and to derive different kinds of learning algorithms. We specifically focus on SVMs and propose a soft margin β-SVM algorithm which behaves better that the state of the art.
منابع مشابه
beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
During the past few years, the machine learning community has paid attention to developing new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce t...
متن کاملSurrogate Losses in Passive and Active Learning by Steve Hanneke
Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0-1 loss, ideally using fewer label requests than the number of random labeled data points sufficie...
متن کاملSurrogate Losses in Passive and Active Learning
Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0-1 loss, ideally using fewer label requests than the number of random labeled data points sufficie...
متن کاملWeakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine?
MOTIVATION Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the methods, results or conclusions of the study in question. Several approaches have been developed to identify such information in scientific journal articles. The best of these have yielded promising results and proved useful for biomedical text mini...
متن کاملAn Ecological Study of the Association between Opiate Use and Incidence of Cancers
Background: Cancer is the second leading cause of death after cardiovascular disease. In recent years it has been hypothesized that opiate use could be a risk factor for cancer. This study aimed to evaluate a possible association between opiate use and common cancers using ecological statistics from around the world.Methods: To investigate the association we used ordinary linear regression mode...
متن کامل