Crowdsourcing Participatory Evaluation of Medical Pictograms Using Amazon Mechanical Turk
نویسندگان
چکیده
BACKGROUND Consumer and patient participation proved to be an effective approach for medical pictogram design, but it can be costly and time-consuming. We proposed and evaluated an inexpensive approach that crowdsourced the pictogram evaluation task to Amazon Mechanical Turk (MTurk) workers, who are usually referred to as the "turkers". OBJECTIVE To answer two research questions: (1) Is the turkers' collective effort effective for identifying design problems in medical pictograms? and (2) Do the turkers' demographic characteristics affect their performance in medical pictogram comprehension? METHODS We designed a Web-based survey (open-ended tests) to ask 100 US turkers to type in their guesses of the meaning of 20 US pharmacopeial pictograms. Two judges independently coded the turkers' guesses into four categories: correct, partially correct, wrong, and completely wrong. The comprehensibility of a pictogram was measured by the percentage of correct guesses, with each partially correct guess counted as 0.5 correct. We then conducted a content analysis on the turkers' interpretations to identify misunderstandings and assess whether the misunderstandings were common. We also conducted a statistical analysis to examine the relationship between turkers' demographic characteristics and their pictogram comprehension performance. RESULTS The survey was completed within 3 days of our posting the task to the MTurk, and the collected data are publicly available in the multimedia appendix for download. The comprehensibility for the 20 tested pictograms ranged from 45% to 98%, with an average of 72.5%. The comprehensibility scores of 10 pictograms were strongly correlated to the scores of the same pictograms reported in another study that used oral response-based open-ended testing with local people. The turkers' misinterpretations shared common errors that exposed design problems in the pictograms. Participant performance was positively correlated with their educational level. CONCLUSIONS The results confirmed that crowdsourcing can be used as an effective and inexpensive approach for participatory evaluation of medical pictograms. Through Web-based open-ended testing, the crowd can effectively identify problems in pictogram designs. The results also confirmed that education has a significant effect on the comprehension of medical pictograms. Since low-literate people are underrepresented in the turker population, further investigation is needed to examine to what extent turkers' misunderstandings overlap with those elicited from low-literate people.
منابع مشابه
Opportunities for Crowdsourcing Research on Amazon Mechanical Turk
Many crowdsourcing studies have been conducted that utilize Amazon Mechanical Turk, a crowdsourcing marketplace platform. The Amazon Mechanical Turk team proposes that comprehensive studies in the areas of HIT design, workflow and reviewing methodologies, and compensation strategies will benefit the crowdsourcing field by establishing a standard library of repeatable patterns and protocols. Author
متن کاملCan we get rid of TREC assessors? Using Mechanical Turk for relevance assessment
Recently, Amazon Mechanical Turk has gained a lot of attention as a tool for conducting different kinds of relevance evaluations. In this paper we show a series of experiments on TREC data, evaluate the outcome, and discuss the results. Our position, supported by these preliminary experimental results, is that crowdsourcing is a viable alternative for relevance assessment.
متن کاملCrowdsourcing Music Similarity Judgments using Mechanical Turk
Collecting human judgments for music similarity evaluation has always been a difficult and time consuming task. This paper explores the viability of Amazon Mechanical Turk (MTurk) for collecting human judgments for audio music similarity evaluation tasks. We compared the similarity judgments collected from Evalutron6000 (E6K) and MTurk using the Music Information Retrieval Evaluation eXchange 2...
متن کاملReal User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk
This paper describes a framework for evaluation of spoken dialogue systems. Typically, evaluation of dialogue systems is performed in a controlled test environment with carefully selected and instructed users. However, this approach is very demanding. An alternative is to recruit a large group of users who evaluate the dialogue systems in a remote setting under virtually no supervision. Crowdso...
متن کاملYou’re Hired! An Examination of Crowdsourcing Incentive Models in Human Resource Tasks
Many human resource tasks, such as screening a large number of job candidates, are labor-intensive and rely on subjective evaluation, making them excellent candidates for crowdsourcing. We conduct several experiments using the Amazon Mechanical Turk platform to conduct resume reviews. We then apply several incentive-based models and examine their effects. Next, we assess the accuracy measures o...
متن کامل