Data Min Knowl Disc Evidence-Based Uncertainty Sampling for Active Learning
نویسندگان
چکیده
Active learning methods select informative instances to effectively learn a suitable classifier. Uncertainty sampling, a frequently utilized active learning strategy, selects instances about which the model is uncertain but it does not consider the reasons for why the model is uncertain. In this article, we present an evidence-based framework that can uncover the reasons for why a model is uncertain on a given instance. Using the evidence-based framework, we discuss two reasons for uncertainty of a model: a model can be uncertain about an instance because it has strong, but conflicting evidence for both classes or it can be uncertain because it does not have enough evidence for either class. Our empirical evaluations on several real-world datasets show that distinguishing between these two types of uncertainties has a drastic impact on the learning efficiency. We further provide empirical and analytical justifications as to why distinguishing between the two uncertainties matters.
منابع مشابه
Multi-Criteria-Based Strategy to Stop Active Learning for Data Annotation
In this paper, we address the issue of deciding when to stop active learning for building a labeled training corpus. Firstly, this paper presents a new stopping criterion, classification-change, which considers the potential ability of each unlabeled example on changing decision boundaries. Secondly, a multi-criteriabased combination strategy is proposed to solve the problem of predefining an a...
متن کاملActive Learning based on Random Forest and Its Application to Terrain Classification
In the machine learning literature many supervised algorithms have been proposed to perform pattern classification tasks. But in many pattern recognition tasks, labels are often expensive to obtain while a vast amount of unlabeled data are easily available. And redundant samples are often included in the training set, thus slowing down the training process of the classifier without improving cl...
متن کاملPaired Sampling in Density-Sensitive Active Learning
Active learning consists of principled on-line sampling over unlabeled data to optimize supervised learning rates as a function of the number of labels requested from an external oracle. A new sampling technique for active learning is developed based on two key principles: 1) Balanced sampling on both sides of the decision boundary is more effective than sampling one side disproportionately, an...
متن کاملActive Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification
This paper addresses two issues of active learning. Firstly, to solve a problem of uncertainty sampling that it often fails by selecting outliers, this paper presents a new selective sampling technique, sampling by uncertainty and density (SUD), in which a k-Nearest-Neighbor-based density measure is adopted to determine whether an unlabeled example is an outlier. Secondly, a technique of sampli...
متن کاملComparing the Influence of Three Educational Methods on the Epidemiology of Occupational Diseases' learning Qualities
Background: Teaching epidemiology of occupational diseases is an important course for occupational health students. If these courses are taught with problem based learning or other new educational methods they will be more beneficial. The objective of this study was the determination of the effects of three educational methods on learning of epidemiology of occupational diseases. Methods: This ...
متن کامل