Widening the Evaluation Net
نویسندگان
چکیده
Intelligent Virtual Agent (IVA) systems are notoriously difficult to evaluate, particularly due to the subjectivity involved. From the various efforts to develop standard evaluation schemes for IVA systems the scheme proposed by Isbister & Doyle, which evaluates systems across five categories, seems particularly appropriate. To examine how these categories are being used, the evaluations presented in the proceedings of IVA ’07 and IVA ’08 are summarised and the extent to which the five categories in the Isbister & Doyle scheme are used is highlighted. 1 IVA Evaluations and IVA ’08 and IVA ’09 As Intelligent Virtual Agent (IVA) research has matured, evaluation has become more important. However, evaluation of IVA systems is notoriously difficult as there are a whole range of issues that must be considered (e.g. are the behaviours of agents believable?, are agents socially capable?, does the system run efficiently in real-time? ), and that these issues tend to be quite subjective. However, without good evaluations it is very difficult to compare competing systems and track the development of the field as a whole. Fortunately, there are a number of proposed standard evaluation schemes for IVA research. One scheme that seems particularly useful was proposed by Isbister & Doyle [1] for evaluating pedagogical conversational agents which evaluates systems under five categories: Believability, Social Interface, Application Domains, Agency & Computational Issues, and Production. To examine the state-of-the-art in evaluation in IVA research, the evaluations described in the proceedings of IVA ’07 [2] and IVA ’08 [3] were summarised. Each full paper published (31 and 45 in IVA ’07 and IVA ’08 respectively) was examined, and the evaluations described were categorised under the 5 categories in the Isbister & Doyle scheme. Papers for which evaluation is simply inappropriate are placed under the category N/A. Finally, those papers that do not describe any evaluations are placed in the category None. Figure 1 shows first how many of the papers in each year evaluate under each of the categories in the scheme, and the N/A and None categories; together with histograms of how many of the categories are covered in the evaluations presented each year. 2 Conclusions & Future Work The points to notice from the graphs in figure 1 are: there are a large number of papers in which no evaluation is described; it is clear that some of the evaluation
منابع مشابه
A Petri-net based modeling tool, for analysis and evaluation of computer systems
Petri net is one of the most popular methods in modeling and evaluation of concurrent and event-based systems. Different tools have been created to support modeling and simulation of different extensions of Petri net in different applications. Each tool supports some extensions and some features. In this work a Petri net based modeling and evaluation tool is presented that not only supports dif...
متن کاملDeterministic Measurement of Reliability and Performance Using Explicit Colored Petri Net in Business Process Execution Language and Eflow
Today there are many techniques for web service compositions. Evaluation of quality parameters has great impact on evaluation of final product. BPEL is one of those techniques that several researches have been done on its evaluation. However, there are few researches on evaluation of QoS in eflow. This research tries to evaluate performance and reliability of eflow and BPEL through mapping them...
متن کاملWidening the Net: Considerations in Interpreting “Literacy Skills, Non-Cognitive Skills and Earnings: An Economist’s Perspective”
متن کامل
Formal approach on modeling and predicting of software system security: Stochastic petri net
To evaluate and predict component-based software security, a two-dimensional model of software security is proposed by Stochastic Petri Net in this paper. In this approach, the software security is modeled by graphical presentation ability of Petri nets, and the quantitative prediction is provided by the evaluation capability of Stochastic Petri Net and the computing power of Markov chain. Each...
متن کاملEducational Effects of Widening Access to the Academic Track: A Natural Experiment
Educational Effects of Widening Access to the Academic Track: A Natural Experiment It is difficult to know whether widening access to schools which provide a more academically oriented general education makes a difference to average educational achievement. We make use of reforms affecting admission to the ‘high ability’ track in Northern Ireland, but not England. The comparison of educational ...
متن کاملThe Paradox of Probation: Community Supervision in the Age of Mass Incarceration.
After four decades of steady growth, U.S. states' prison populations finally appear to be declining, driven by a range of sentencing and policy reforms. One of the most popular reform suggestions is to expand probation supervision in lieu of incarceration. However, the classic socio-legal literature suggests that expansions of probation instead widen the net of penal control and lead to higher ...
متن کامل