Active Learning from Crowds
نویسندگان
چکیده
Obtaining labels can be expensive or timeconsuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled instances to be labeled by an oracle. The general problem of optimally choosing these instances is known as active learning. As it is usually set in the context of supervised learning, active learning relies on a single oracle playing the role of a teacher. We focus on the multiple annotator scenario where an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for querying. This paradigm posits new challenges to the active learning scenario. We can now ask which data sample should be labeled next and which annotator should be queried to benefit our learning model the most. In this paper, we employ a probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate across the task domain. We then focus on providing a criterion and formulation that allows us to select both a sample and the annotator/s to query the labels from.
منابع مشابه
Active Learning from Crowds with Unsure Option
Learning from crowds, where the labels of data instances are collected using a crowdsourcing way, has attracted much attention during the past few years. In contrast to a typical crowdsourcing setting where all data instances are assigned to annotators for labeling, active learning from crowds actively selects a subset of data instances and assigns them to the annotators, thereby reducing the c...
متن کاملMulti-Label Active Learning from Crowds
Multi-label active learning is a hot topic in reducing the label cost by optimally choosing the most valuable instance to query its label from an oracle. In this paper, we consider the poolbased multi-label active learning under the crowdsourcing setting, where during the active query process, instead of resorting to a high cost oracle for the ground-truth, multiple low cost imperfect annotator...
متن کاملMinimizing Queries for Active Labeling with Sequential Analysis
When building datasets for supervised machine learning problems, data is often labelled manually by human annotators. In domains like medical imaging, acquiring labels can be prohibitively expensive. Both active learning and crowdsourcing have emerged as ways to frugally label datasets. In active learning, there has been recent interest in algorithms that exploit the data’s structure to direct ...
متن کاملTraining Agents by Crowds
On-line learning algorithms are particularly suitable for developing interactive computational agents. These algorithm can be used to teach the agents the abilities needed for engaging in social interactions with humans. If humans are used as teachers in the context of on-line learning algorithms a serious challenge arises: their lack of commitment and availability during the required extensive...
متن کاملClustering Crowds
We present a clustered personal classifier method (CPC method) that jointly estimates a classifier and clusters of workers in order to address the learning from crowds problem. Crowdsourcing allows us to create a large but low-quality data set at very low cost. The learning from crowds problem is to learn a classifier from such a lowquality data set. From some observations, we notice that worke...
متن کامل