Confidence-Rated Discriminative Partial Label Learning
نویسندگان
چکیده
Partial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one label is valid. The common discriminative solution to learn from partial label examples assumes one parametric model for each class label, whose predictions are aggregated to optimize specific objectives such as likelihood or margin over the training examples. Nonetheless, existing discriminative approaches treat the predictions from all parametric models in an equal manner, where the confidence of each candidate label being the ground-truth label is not differentiated. In this paper, a boosting-style partial label learning approach is proposed to enabling confidence-rated discriminative modeling. Specifically, the ground-truth confidence of each candidate label is maintained in each boosting round and utilized to train the base classifier. Extensive experiments on artificial as well as real-world partial label data sets validate the effectiveness of the confidence-rated discriminative modeling. Introduction Partial label learning deals with the problem where each training example is associated with a set of candidate labels, among which only one label corresponds to the ground-truth one (Cour, Sapp, and Taskar, 2011; Zhang, 2014). Formally, let X = R denote the d-dimensional instance space and Y = {y1, y2, . . . , yq} denote the label space consisting of q class labels. The task of partial label learning is to induce a multi-class classifier f : X 7→ Y from the partial label training set D = {(xi, Si) | 1 ≤ i ≤ m}. Here, xi ∈ X is a d-dimensional feature vector and Si ⊆ Y is the set of candidate labels associated with xi. Particularly, the ground-truth label yi for xi is confined within Si but not directly accessible to the learning algorithm. The need of partial label learning arises in a number of real-world scenarios where only weak labeling information can be acquired during training data collection, such as automatic face naming (Cour et al., 2009; Zeng et al., 2013), web mining (Jie and Orabona, 2010), ecoinformatics (Liu and Dietterich, 2012), etc. In some literature, partial label learning is also termed as ambiguous label learning (Hüllermeier Copyright c ⃝ 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. and Beringer, 2006; Chen et al., 2014) or superset label learning (Liu and Dietterich, 2014). To learn from partial label examples, the common discriminative solution is to assume one parametric model g(yj | x;θ) for each class label yj , whose modeling outputs are aggregated to optimize specific objectives such as likelihood or margin over the training examples (Jin and Ghahramani, 2003; Nguyen and Caruana, 2008; Cour, Sapp, and Taskar, 2011; Liu and Dietterich, 2012; Chen et al., 2014; Yu and Zhang, 2016). Existing discriminative approaches conduct aggregation by treating the modeling outputs from all parametric models in an equal manner, where the confidence of each candidate label being the ground-truth label is not differentiated. This strategy might be suboptimal as each candidate label should contribute differently to the learning process, especially the contribution from the ground-truth label (i.e. yi) against those from the false positive labels (i.e. Si \ {yi}) (Zhang, Zhou, and Liu, 2016). To overcome the potential drawback of existing strategy, a novel partial label learning approach named CORD, i.e. COnfidence-Rated Discriminative partial label learning, is proposed in this paper. CORD learns from partial label examples by adapting the popular boosting techniques, where the weights over training examples and the groundtruth confidences of candidate labels are maintained in each boosting round. Accordingly, the discriminative base classifier is trained by utilizing the currently-available weight and ground-truth confidence information. Empirical studies on a broad range of controlled UCI data sets and real-world partial label data sets clearly verify the effectiveness of the proposed confidence-rated discriminative learning approach. We start the rest of this paper by briefly reviewing related work on partial label learning. Then, we present technical details of the proposed CORD approach and report experimental results of the comparative studies. Finally, we conclude the paper and indicate future research issues. Related Work In partial label learning, the labeling information conveyed by the training examples is weak as the ground-truth label is not accessible to the learning algorithm. It is worth noting that partial label learning is related to other wellstudied weakly-supervised learning frameworks including semi-supervised learning (Zhu and Goldberg, 2009), multiinstance learning (Amores, 2013) and multi-label learning (Zhang and Zhou, 2014), while the weak supervision scenarios to be dealt with are different. Semi-supervised learning aims to induce a classifier f : X 7→ Y from a few labeled examples along with abundant unlabeled examples, where the ground-truth label assumes the whole label space for unlabeled example while the candidate label set for partial label example. Multi-instance learning aims to induce a classifier f : 2X 7→ Y from training examples each represented by a bag of instances, where the label is assigned at the bag level for multi-instance example while at the instance level for partial label example. Multi-label learning aims to induce a classifier f : X 7→ 2Y from examples each associated with multiple labels, where the associated labels are all valid ones for multi-label example while only candidate ones for partial label example. Discriminative modeling is the most common solution to learn from partial label examples, where one parametric model g(yj | x;θ) is assumed for each class label yj (1 ≤ j ≤ q). Correspondingly, model parameters are trained by optimizing specific objectives J(D;θ) over the training examples. One popular instantiation of the objective function is to aggregate the modeling output of each parametric model via the maximum likelihood criterion (Jin and Ghahramani, 2003; Liu and Dietterich, 2012):
منابع مشابه
Beyond Disagreement-Based Agnostic Active Learning
We study agnostic active learning, where the goal is to learn a classifier in a pre-specified hypothesis class interactively with as few label queries as possible, while making no assumptions on the true function generating the labels. The main algorithms for this problem are disagreement-based active learning, which has a high label requirement, and margin-based active learning, which only app...
متن کاملConfidence-Weighted Learning of Factored Discriminative Language Models
Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can bett...
متن کاملActive Learning with Connections to Confidence-rated Prediction
In the problem of active learning, we are given a set of unlabelled examples and the ability to query the labels of a subset of them, in an adaptive manner. The goal is to find a classifier with a target excess error, while querying as few labels as possible. In this report, we review several existing solutions to this problem: generalized binary search, disagreement-based active learning and m...
متن کاملToward semantic attributes in dictionary learning and non-negative matrix factorization
Binary label information is widely used semantic information in discriminative dictionary learning and non-negative matrix factorization. A Discriminative Dictionary Learning (DDL) algorithm uses the label of some data samples to enhance the discriminative property of sparse signals. A discriminative Non-negative Matrix Factorization (NMF) utilizes label information in learning discriminative b...
متن کاملPhoneme-Discriminative Features for Dysarthric Speech Conversion
We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least ...
متن کامل