Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons
نویسندگان
چکیده
We introduce an unsupervised approach to efficiently discover the underlying features in a data set via crowdsourcing. Our queries ask crowd members to articulate a feature common to two out of three displayed examples. In addition, we ask the crowd to provide binary labels for these discovered features on the remaining examples. The triples are chosen adaptively based on the labels of the previously discovered features on the data set. This approach is motivated by a formal framework of feature elicitation that we introduce and analyze in this paper. In two natural models of features, hierarchical and independent, we show that a simple adaptive algorithm recovers all features with less labor than any nonadaptive algorithm. The savings are as a result of automatically avoiding the elicitation of redundant features or synonyms. Experimental results validate the theoretical findings and the usefulness of this approach.
منابع مشابه
Adaptively Learning the Crowd Kernel
We introduce an algorithm that, given n objects, learns a similarity matrix over all n pairs, from crowdsourced data alone. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form “is object a more similar to b or to c?” and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objec...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملLearning to Top-K Search using Pairwise Comparisons
Given a collection of N items with some unknown underlying ranking, we examine how to use pairwise comparisons to determine the top ranked items in the set. Resolving the top items from pairwise comparisons has application in diverse fields ranging from recommender systems to image-based search to protein structure analysis. In this paper we introduce techniques to resolve the top ranked items ...
متن کاملCrowdstore: A Crowdsourcing Graph Database
Existing crowdsourcing database systems fail to support complex, collaborative or responsive crowd work. These systems implement human computation as independent tasks published online, and subsequently chosen by individual workers. Such pull model does not support worker collaboration and its expertise matching relies on workers’ subjective self-assessment. An extension to graph query language...
متن کاملAutomated feature discovery via sentence selection and source code summarization
Programs are, in essence, a collection of implemented features. Feature Discovery in software engineering is the task of identifying key functionalities that a program implements. Manual feature discovery can be time-consuming and expensive, leading to automatic feature discovery tools being developed. However, these approaches typically only describe features using lists of keywords, which can...
متن کامل