Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

نویسندگان

James Y. Zou

Kamalika Chaudhuri

Adam Tauman Kalai

چکیده

We introduce an unsupervised approach to efficiently discover the underlying features in a data set via crowdsourcing. Our queries ask crowd members to articulate a feature common to two out of three displayed examples. In addition, we ask the crowd to provide binary labels for these discovered features on the remaining examples. The triples are chosen adaptively based on the labels of the previously discovered features on the data set. This approach is motivated by a formal framework of feature elicitation that we introduce and analyze in this paper. In two natural models of features, hierarchical and independent, we show that a simple adaptive algorithm recovers all features with less labor than any nonadaptive algorithm. The savings are as a result of automatically avoiding the elicitation of redundant features or synonyms. Experimental results validate the theoretical findings and the usefulness of this approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptively Learning the Crowd Kernel

We introduce an algorithm that, given n objects, learns a similarity matrix over all n pairs, from crowdsourced data alone. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form “is object a more similar to b or to c?” and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objec...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Learning to Top-K Search using Pairwise Comparisons

Given a collection of N items with some unknown underlying ranking, we examine how to use pairwise comparisons to determine the top ranked items in the set. Resolving the top items from pairwise comparisons has application in diverse fields ranging from recommender systems to image-based search to protein structure analysis. In this paper we introduce techniques to resolve the top ranked items ...

متن کامل

Crowdstore: A Crowdsourcing Graph Database

Existing crowdsourcing database systems fail to support complex, collaborative or responsive crowd work. These systems implement human computation as independent tasks published online, and subsequently chosen by individual workers. Such pull model does not support worker collaboration and its expertise matching relies on workers’ subjective self-assessment. An extension to graph query language...

متن کامل

Automated feature discovery via sentence selection and source code summarization

Programs are, in essence, a collection of implemented features. Feature Discovery in software engineering is the task of identifying key functionalities that a program implements. Manual feature discovery can be time-consuming and expensive, leading to automatic feature discovery tools being developed. However, these approaches typically only describe features using lists of keywords, which can...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

نویسندگان

چکیده

منابع مشابه

Adaptively Learning the Crowd Kernel

Perform Three Data Mining Tasks with Crowdsourcing Process

Learning to Top-K Search using Pairwise Comparisons

Crowdstore: A Crowdsourcing Graph Database

Automated feature discovery via sentence selection and source code summarization

عنوان ژورنال:

اشتراک گذاری