Comparative Analysis of Bradley-Terry and Thurstone-Mosteller Paired Comparison Models for Image Quality Assessment

نویسنده

  • John C. Handley
چکیده

In image quality assessment, preference for various image processing algorithms or treatments is often determined using paired comparisons. In this experimental design, pairs of images processed by different algorithms or “treatments” are presented to a judge. The preferred treatment is selected and a tally is kept of the number of times each treatment is preferred to another. It is possible to estimate an interval scale for treatments in a hypothetical psychological space using this method. There are two dominate paired comparison statistical models: Thurstone-Mosteller Case V (TM) (corresponding to Thurstone’s Law of Comparative Judgment, Case V) and Bradley-Terry (BT). Although TM is used almost exclusively in the imaging literature, the BT formulation is more mathematically developed. Owing to its parsimony, it provides tractable maximum-likelihood estimators for scales, simultaneous confidence intervals and hypothesis tests for model fit, uniformity, and differences among populations of judges. In practice, TM and BT yield nearly identical scale estimates for complete data. In some experimental designs, many treatments are compared. Owing to the large number of possible treatment pairs, not every comparison is made, leading to an incomplete matrix of preference counts. Unlike TM, BT model applies directly to incomplete data under mild restrictions We compare and critique TM and BT models. Statistical analyses, many not available under TM, are demonstrated. An argument is made that BT offers overwhelming advantages to the imaging community and should be used instead of TM. Introduction This paper compares two well-known paired comparison models: the Thurstone-Mosteller (TM) model (by which we mean Thurstone’s Law of Comparative Judgment, Case V) and the Bradley-Terry (BT) model. (Mosteller’s name is included in TM due to his work on the statistical analysis of Thurstone’s model). We argue here that BT model should be used in place of TM because presently the former is more developed mathematically than the latter. In particular, easy formulas exist for maximum likelihood estimates (mle) of scale parameters. The asymptotic theory of mle’s yields estimators for confidence regions and test statistics based on likelihood ratios for hypothesis testing. TM is privileged within the imaging community ostensibly owing to its origins in psychophysics. Yet it is universally acknowledged that TM and BT yield similar scale estimates. The theory (and software) for generalized linear models can produce mle’s yet BT, with its roots in experimental design and consumer choice modeling, offers numerically easier statistical procedures. We present no new research although we do show an alternative analysis to previously published data. Our intent is to provide the imaging community with a general context for paired comparisons, compare and contrast the two models, and demonstrate the advantages of BT. The Linear Model TM and BT models are both linear models of paired comparisons. In such models, probabilities of preference can be mapped to scales. Formally (following David, 1988 [4]), let i V and j V represent “merits” of objects i A and j A , respectively. In a psychophysics setting, the i V might represent sensation magnitudes on a scale. We represent the observed merit of object i A by random variable i X owing to observation-to-observation variation. A linear model takes the form ( ) ( ) i j i j ij P X X H V V π > ≡ = − (1) where H is a monotonic, increasing function such that ( ) 0 H −∞ = , ( ) 1 H +∞ = , and ( ) 1 ( ) H x H x − = − . There are obviously an infinite number of choices for function H, the two of concern here are the Thurstone-Mosteller model where H is the normal cumulative distribution function with zero mean and the Bradley-Terry model where [ ] 1 ( ) 1 tanh( / 2) 2 H x x = + (2) The task is to produce estimates i v of i V , 1, , i m = . If the function H has additional parameters, we need to estimate those as well. Assume without loss of generality 1 0 m i i V = = ∑ and define ij i j V V δ = − . Estimation proceeds by tallying ij α , the number of times object i A is preferred to object j A after ij n comparisons. A sample estimate of ij π is / ij ij ij p n α = . We define ( ) ij ij H d p = and compute merit or scale estimates i v by ij i j d v v = − , i j ≠ , , 1, , i j m = . It can be shown that a least squares estimate of i V is 1 m i ij i j v d m ≠ = ∑ (3) This estimate holds regardless of H and is the usual method for Thurstone’s Case V model. Assume that each pair is observed a fixed (but possibly unequal) number of times. That is, the sums ij n are fixed and the tallies ij α are binomial random variables: ( ) (1 ) , 0,1, , ij ij ij ij ij ij ij ij ij ij n n P n α α α π π α α − = − =       (4) Owing to independence, the likelihood function is

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How to Analyze Paired Comparison Data

Thurstone’s Law of Comparative Judgment provides a method to convert subjective paired comparisons into one-dimensional quality scores. Applications include judging quality of different image reconstructions, or different products, or different web search results, etc. This tutorial covers the popular Thurstone-Mosteller Case V model and the Bradley-Terry logistic variant. We describe three app...

متن کامل

Introductory note to 1928

Zermelo’s 1928 paper on measuring participants’ playing strengths in chess tournaments is a remarkable work in the history of paired comparison modeling. Apart from several contemporary papers by Thurstone (1927a, 1927b, 1927c), Zermelo’s paper was an isolated excursion into paired comparison methods that was far ahead of its time. After this paper, the field remained mostly dormant for about 2...

متن کامل

Fitting loglinear Bradley-Terry models (LLBT) for paired comparisons using the R package prefmod

This paper aims at introducing the R package prefmod (Hatzinger, 2009) which allows the user to fit various models to paired comparison data. These models give estimated overall rankings of objects or items where each subject (respondent/judge) makes one or more comparisons between pairs of objects (items). The focus is on the loglinear Bradley-Terry (LLBT) model, the loglinear formulation of t...

متن کامل

When is it Better to Compare than to Score?

When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements. In this paper we study the relative merits of either choice, providing empirical and theoretical guidelines for the selection of a measurement scheme. We provide empirical evidence based on experiments on Amazon Mechanical Turk that i...

متن کامل

Inductive Pairwise Ranking: Going Beyond the n log(n) Barrier

We study the problem of ranking a set of items from nonactively chosen pairwise preferences where each item has feature information with it. We propose and characterize a very broad class of preference matrices giving rise to the Feature Low Rank (FLR) model, which subsumes several models ranging from the classic Bradley–Terry–Luce (BTL) (Bradley and Terry 1952) and Thurstone (Thurstone 1927) m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001