Error Rate Bounds in Crowdsourcing Models

نویسندگان

  • Hongwei Li
  • Bin Yu
  • Dengyong Zhou
چکیده

Crowdsourcing is an effective tool for human-powered computation on many tasks challenging for computers. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules under the Dawid-Skene crowdsourcing model. The bounds can be applied to analyze many common prediction methods, including the majority voting and weighted majority voting. These bound results could be useful for controlling the error rate and designing better algorithms. We show that the oracle Maximum A Posterior (MAP) rule approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule, and propose a simple data-driven weighted majority voting (WMV) rule (called one-step WMV) that attempts to approximate the oracle MAP and has a provable theoretical guarantee on the error rate. Moreover, we use simulated and real data to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule, and to demonstrate that the mean error rate of the data-driven EM-MAP rule is also bounded by the mean error rate bound of the oracle MAP rule with estimated parameters plugging into the bound. 1. Introduction. There are many tasks that can be easily carried out by people but tend to be hard for computers, e.g. image annotation and visual design. When these tasks require large scale data processing, outsourcing them to experts or well-trained people may be too expensive. Crowdsourcing has recently emerged as a powerful alternative. It outsources tasks to a distributed group of people (usually called workers) who might be inexperienced in these tasks. However, if we can appropriately aggregate the outputs from a crowd, the aggregated results could be as good as the results by an The flaws of crowdsourcing are apparent. Each worker is paid purely based on how many tasks that he/she has completed (for example, one cent for labeling one image). No ground truth is available to evaluate how well he/she has performed in the tasks. So some workers may randomly submit answers independent of the questions when the tasks assigned to them

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing

Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectat...

متن کامل

Error Rate Analysis of Labeling by Crowdsourcing

Crowdsourcing label generation has been a crucial component for many real-world machine learning applications. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules for the Dawid-Skene (and Symmetric DawidSkene ) crowdsourcing model. The bounds can be applied to analyze many commonly used prediction m...

متن کامل

Theoretical Analysis and Efficient Algorithms for Crowdsourcing

Theoretical Analysis and Efficient Algorithms for Crowdsourcing by Hongwei Li Doctor of Philosophy in Statistics University of California, Berkeley Professor Bin Yu, Chair Crowdsourcing has become an effective and popular tool for human-powered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to a...

متن کامل

Exact Exponent in Optimal Rates for Crowdsourcing

In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent mI(π) in which m is the number of workers and I(π) the average Chernoff infor...

متن کامل

Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model

While crowdsourcing has become an important means to label data, crowdworkers are not always experts— sometimes they can even be adversarial. Therefore, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoret...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1307.2674  شماره 

صفحات  -

تاریخ انتشار 2013