The DET curve in assessment of detection task performance
نویسندگان
چکیده
We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognition and language recognition. We explain why it is likely to produce approximately linear curves. We also note special points that may be included on these curves, how they are used with multiple targets, and possible further applications. INTRODUCTION Detection tasks can be viewed as involving a tradeoff between two error types: missed detections and false alarms. An example of a speech processing task is to recognize the person who is speaking, or to recognize the language being spoken. A recognition system may fail to detect a target speaker or language known to the system, or it may declare such a detection when the target is not present. When there is a tradeoff of error types, a single performance number is inadequate to represent the capabilities of a system. Such a system has many operating points, and is best represented by a performance curve. The ROC Curve traditionally has been used for this purpose. Here ROC has been taken to denote either the Receiver Operating Characteristic [2,3,4] or alternatively, the Relative Operating Characteristic [1]. Generally, false alarm rate is plotted on the horizontal axis, while correct detection rate is plotted on the vertical. We have found it useful in speech applications to use a variant of this which we call the DET (Detection Error Tradeoff) Curve, described below. In the DET curve we plot error rates on both axes, giving uniform treatment to both types of error, and use a scale for both axes which spreads out the plot and better distinguishes different well performing systems and usually produces plots that are close to linear. Figure 1 gives an example of DET curves, while Figure 2 contrasts this with traditional ROC type curves for the same data. Note the near linearity of the curves in the DET plot and how better spread out they are permitting easy observation of system contrasts. Figure 1: Plot of DET Curves for a speaker recognition evaluation. GENERAL EVALUATION PROTOCOL Our evaluations of speech processing systems are comparable to fundamental detection tasks. Participants are given a set of known targets (speakers or languages) for which their systems have trained models and a set of unknown speech segments. During the evaluation the speech processing system must determine whether or not the unknown segment is one of the known targets. The system output is a likelihood that the segment is an instance of the target. The scale of the likelihood is arbitrary, but should be consistent across all decisions, with larger values indicating greater likelihood of being a target. These likelihoods are used to generate the performance curve displaying the range of possible operating characteristics. Figure 2 shows a traditional ROC curve for a NIST coordinated speaker recognition evaluation task. The abscissa axis shows the false alarm rate while the ordinate axis shows the detection rate on linear scales. The optimal point is at the upper left of the plot, and the curves of well performing systems tend to bunch together near this corner. (In the figures we omit the keys identifying the individual systems.) Figure 2: Plot of ROC Curves for the same evaluation data as in Figure 1. NORMAL DEVIATE SCALE Let us suppose that the likelihood distributions for nontargets and targets are both normally distributed with respective means u0 and u1. This is illustrated in Figure 3, where the variances of the distributions are taken to be equal. Figure 3: Normal Distributions. The choice of an operating point c is shown by a bold line, and the two error types are represented by the areas of the shaded regions. Now suppose that when we go to plot the miss versus the false alarm probabilities, rather than plotting the probabilities themselves, we plot instead the normal deviates that correspond to the probabilities. This is displayed in Figure 4. In figure 4, we show probabilities on the bottom and left, and standard deviations on the top and right. The standard deviations are omitted from subsequent plots. Figure 4: Normal Deviate Scale. Note that the linearity of the plot is a result of the assumed normality of the likelihood distributions. The unit slope is a consequence of the equal variances. Also note that on the diagonal scale indicated we have
منابع مشابه
Comparison of Evaluation Metrics for Sentence Boundary Detection
Automatic detection of sentences in speech is useful to enrich speech recognition output and ease subsequent language processing modules. In the recent NIST evaluations for this task, an error rate was used to evaluate system performance. A variety of metrics such as F-measure, ROC or DET curves have also been explored in other studies. This paper aims to take a closer look at the evaluation is...
متن کاملCalculation of a Composite DET Curve
The verification performance of biometric systems is normally evaluated using the receiver operating characteristic (ROC) or detection error trade-off (DET) curve. We propose two new ideas for statistical evaluation of biometric systems based on these data. The first is a new way to normalize match score distributions. A normalized match score, t̂, is calculated as a function of the angle from a...
متن کاملCompressed Time Delay Neural Network for Small-Footprint Keyword Spotting
In this paper we investigate a time delay neural network (TDNN) for a keyword spotting task that requires low CPU, memory and latency. The TDNN is trained with transfer learning and multi-task learning. Temporal subsampling enabled by the time delay architecture reduces computational complexity. We propose to apply singular value decomposition (SVD) to further reduce TDNN complexity. This allow...
متن کاملCurvewise DET Confidence Regions and Pointwise EER Confidence Intervals Using Radial Sweep Methodology
One methodology for evaluating the matching performance of biometric authentication systems is the detection error tradeoff (DET) curve. The DET curve graphically illustrates the relationship between false rejects and false accepts when varying a threshold across a genuine and an imposter match score distributions. This paper makes two contributions to the literature on the matching performance...
متن کاملTask-Based Listening Assessment and the Influence of Construct-Irrelevant Variance
Task-based listening tests such as IELTS require testees to listen to some information on a CD and simultaneously answer the related items. To answer such items, testees are expected to comprehend, analyze, compare and infer pieces of information while listening to the incoming audio material. The present research attempted to investigate whether the two major characteristics of question type a...
متن کامل