Task-Based Evaluation of NLG Systems: Control vs Real-World Context
نویسنده
چکیده
Currently there is little agreement about, or even discussion of, methodologies for taskbased evaluation of NLG systems. I discuss one specific issue in this area, namely the importance of control vs the importance of ecological validity (real-world context), and suggest that perhaps we need to put more emphasis on ecological validity in NLG evaluations.
منابع مشابه
Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation
Today’s NLG efforts should be compared against actual human performance, which is fluent and varies randomly and with context. Consequently, evaluations should not be done against a fixed ‘gold standard’ text, and shared task efforts should not assume that they can stipulate the representation of the source content and still let players generate the diversity of texts that the real world calls ...
متن کاملEvaluation of NLG: Some Analogies and Differences with Machine Translation and Reference Resolution
This short paper first outlines an explanatory model that contrasts the evaluation of systems for which human language appears in their input with systems for which language appears in their output, or in both input and output. The paper then compares metrics for NLG evaluation with those applied to MT systems, and then with the case of reference resolution, which is the reverse task of generat...
متن کاملReal World Modeling and Nonlinear Control of an Electrohydraulic Driven Clutch
In this paper, a complete model of an electro hydraulic driven dry clutch along with its performance evaluation has elucidated. Through precision modeling, a complete nonlinear physical and full order sketch of clutch has drawn. Ultimate nonlinearities existent in the system prohibits it from being controlled by conventional linear control algorithms and to compensate the behavior of the sy...
متن کاملValidating the web-based evaluation of NLG systems
The GIVE Challenge is a recent shared task in which NLG systems are evaluated over the Internet. In this paper, we validate this novel NLG evaluation methodology by comparing the Internet-based results with results we collected in a lab experiment. We find that the results delivered by both methods are consistent, but the Internetbased approach offers the statistical power necessary for more fi...
متن کاملReuse and Challenges in Evaluating Language Generation Systems: Position Paper
Although there is an increasing shift towards evaluating Natural Language Generation (NLG) systems, there are still many NLG-specific open issues that hinder effective comparative and quantitative evaluation in this field. The paper starts off by describing a task-based, i.e., black-box evaluation of a hypertext NLG system. Then we examine the problem of glass-box, i.e., module specific, evalua...
متن کامل