Expressing Preferences - 1 Running Head: Expressing Preferences in a Principal-Agent Task Expressing Preferences in a Principal-Agent Task: A Comparison of Choice, Rating and Matching
نویسندگان
چکیده
One of the more disturbing yet important findings in the social sciences is the observation that alternative tasks result in different expressed preferences among choice alternatives. We examine this problem not from the perspective of an individual making personal decisions, but from the perspective of an agent trying to follow the known values of a principal. In two studies, we train people to evaluate outcomes described by specific attributes and then examine their ability to express these known values with three common tasks: ratings of individual alternatives, choices among triples of alternatives, and matching pairs of alternatives to indifference. We find that each preference assessment method has distinct strengths and weaknesses. Ratings are quick, robust at following known values, and are perceived as an easy task by respondents. However, because ratings require projection to an imprecise response scale, respondents have difficulty when applying them to more complex preference structures. Further, they place too much weight on negative information, a result that is consistent with reference-dependent loss aversion. Choice is perceived as the most realistic task and the one about which people feel the most confident. However, choices exhibit the most negativity, which, in addition to flowing from the same perceptual bias of ratings, may be exacerbated by a screening strategy that excludes alternatives possessing the lowest level of an attribute. Finally, the matching task takes the most time and is perceived to be the most difficult. It shows minimal biases, except for one glaring flaw, a substantial overweighting of the matching variable. This bias is consistent with a wellknown compatibility bias and suggests that agents can learn to use a matching task appropriately for all attributes except the matching variable itself. The paper concludes with a discussion of the theoretical mechanisms by which these biases infiltrate different elicitation modes and a summary of managerial implications of these results. Expressing Preferences 3 Expressing Preferences in a Principal-Agent Task: A Comparison of Choice, Rating and Matching Research in the field of judgment and decision making has generated convincing evidence that people construct their preferences in the light of demands produced by the situation and the response task (Payne, 1982; Kahneman & Tversky, 1984; Slovic, 1995). As predicted by this constructive view of preferences, different elicitation tasks evoke systematically different preferences. For example, studies of “task effects” have clearly demonstrated that a matching task, specifying the amount of an attribute required to make alternatives equal, results in quite different preference orderings than choice (Tversky, Sattath & Slovic; 1988, Fischer & Hawkins, 1993; Hawkins, 1994; Ordóñez, Mellers, Chang & Roberts, 1995). Related research shows that rating is different from choice (Bazerman, Loewenstein & White, 1992; Fischer & Hawkins, 1993; Schkade & Johnson, 1989; Nowlis & Simonson, 1997; Delquié, 1993; Ahlbrecht & Weber, 1997), and that matching is different from rating (Fischer & Hawkins, 1993; Hsee, 1996). While there has been active debate on the mechanisms behind these phenomena, there is little doubt that the preferences revealed depend on the questions asked. In this work, we examine whether similar preference shifts occur for agents, in the context of three tasks – ratings of individual options, matching of pairs to indifference, and choice among triples. Studying these methods within an agent task is important because it can tell us not only how the methods differ from each other but also how they differ from the true preference structure which the agent seeks to emulate. Our use of an agent is similar to its use in multiple cue probability learning (Hammond, Summers & Deane, 1973), but with a different goal. Whereas that stream of research is concerned with how people learn probabilistic cues in the environment, our focus is on the consistency and biases associated with human ability to transmit known values using different Expressing Preferences 4 preference elicitation tasks. A strong point of difference is that the multiple probability cue learning paradigm requires subjects to infer policy from noisy feedback on the evaluation of profiles. In our agent tasks, there is no need to learn the “partworth values” – they are always displayed with graphs such as shown in Figure 2. At issue is the extent to which people can correctly apply a given set of partworths under different tasks. Our approach shares kinship with the work of Klein & Bither (1987) who used an analogous agency task to explore cutoff use in simplifying choices. Similarly, Stone & Kadous (1997) used an agent task to estimate the impact of ambient affect and task difficulty on choice accuracy. In contrast to both of these papers, our focus is less on estimating accuracy than on identifying consistent biases that arise. An additional advantage of the agent task is that it isolates those biases that occur in the expression of preferences. If we consider three stages in the general value judgment problem as comprehending the information, understanding appropriate tradeoffs, and expressing those tradeoffs through a specific task, then our study focuses on the last stage. Our task thereby provides an upper bound on decision-makers’ ability to express value through different tasks. Theoretical Differences Among the Tasks What kinds of biases would one expect to emerge among the choice, rating, and matching tasks studied here? The theoretical framework that we adopt is that people develop strategies that enable them to minimize effort while preserving accuracy (Bettman, Johnson & Payne, 1990). As Peter Wright (1974) and Hillel Einhorn (1971) suggested more than twenty-five years ago, simplification can be achieved by focusing on the more important pieces of information. It is useful to distinguish between two ways in which this simplification can occur. First, attribute Expressing Preferences 5 focusing occurs when more important attributes receive exaggerated attention. By contrast, level focusing occurs when it is the levels within attributes that get exaggerated attention. Attribute focusing. Attribute focusing minimizes effort by ignoring less important attributes. Russo & Dosher (1983) called this process dimensional reduction. To illustrate the way attribute focusing would be realized, imagine a target pattern of partworth values for ski trips such as those shown in Figure 1A. In this hypothetical case, the full range of price is most important with 45% of the sum of the ranges of the other attributes, followed by ski slope quality with 35%, and then probability of good snow with 20%. Attribute focusing, pictured in Figure 1B, increases the weight of the most important attribute, price, by 22%, while decreasing the weight of slope quality and snow probability by 14% and 25% respectively. Two mechanisms, prominence and scale compatibility, have been identified as leading to attribute focus. The prominence effect reflects the empirical generalization that people are more likely to prefer an alternative that is superior on the more prominent attribute when making choices than when making judgments (Tversky, Sattath & Slovic, 1988; Fischer & Hawkins, 1993; Hawkins, 1994). Contrasting Figures 1A and 1B, the prominence effect predicts a greater slope to the most important attribute, price, relative to the other two. Scale compatibility is a second well-known attribute-focusing process in which people give greater weight to attributes represented in units similar to those of the response variable (Delquié, 1993; Fischer & Hawkins, 1993; Slovic, Griffin, & Tversky, 1990; Delquié, 1997; Borcherding, Eppel & von Winterfeldt, 1991). This distortion arises when a stimulus coded in units similar to those of the response scale is more “compatible” with that response and therefore receives greater weight. For example, an attribute with a 0-100 coding will have greater slope if the evaluation scale shares the same metric, presumably because it is easier to transfer comparable units. Expressing Preferences 6 Level focusing. A second simplification mechanism involves giving exaggerated attention to particular level differences within attributes. We define level focusing in terms of an attribute’s low-end weight, the proportion of weight given to the difference between the lowest and the middle levels compared to the total utility range ((Vmid-Vlow)/Vtot). Thus, the target in Figure 1A shows price with 80% of its weight in the low end, demonstrating diminishing returns to better (lower) price. Slope quality has constant returns, so its low-end weight is 50%. Finally probability of snow has increasing returns evidenced by a low-end weight of 20%. We examine two mechanisms that can lead to shifts in level focusing – negativity and utility dependent cutoff strategies. Negativity involves giving greater attention to less preferred attribute levels. The contrast between Figure 1A and 1C illustrates this process whereby the differences between the high and middle levels diminish, and those between middle and low levels increase. In particular, the low-end weight of price increases by 13% (80% to 90%), slope quality by 40% (50% to 70%), and snow probability by 150% (20% to 50%). Negativity effects have been demonstrated in a large number of domains (Kanouse & Hansen, 1972; Wright, 1974; Taylor, 1991; Wedell & Senter, 1997). We test whether negativity also occurs in an agent task, and if its magnitude changes across the three different elicitation tasks. Reference dependence is a largely accepted theoretical driver of negativity. Following Kahneman & Tversky’s (1984) prospect theory, value functions are steeper below the reference point than above it. This loss aversion around a reference point predicts negativity as long as the reference point is near the middle level of an attribute. Reference dependence should have differential impact for rating, choice & matching. Rating tasks are likely to evoke anchoring around the middle-levels of an attribute, leading to lower valuations of alternatives containing low attribute levels. For choice, this reference dependence will be further exacerbated if options Expressing Preferences 7 are more likely to be eliminated when one or more attributes fall below minimum acceptable reference levels, producing an apparent kink in the value function at that reference point. By contrast, negativity is least likely when matching pairs since they provide their own reference, lessening the need for or availability of an external reference point. Klein & Bither (1987) suggest a different form of level focusing. Under their utility dependent cutoff mechanism, people simplify judgment tasks by selectively ignoring less valued attribute differences. This mechanism is important because its focus on large utility differences is a justifiable simplification heuristic from a cost-benefit perspective. That is, if one has to ignore differences among levels, it is most efficient to ignore small differences that will minimally impact preferences. As illustrated in the contrast between Figures 1A and 1D, this process expands the larger value differences within an attribute and diminishes the smaller ones, thereby exaggerating any initial curvature. Klein & Bither produced evidence that cutoffs follow a utility dependent model, but were not able to separate utility dependence from negativity. We develop experiments that expand their work by testing contexts in which negativity and utilitydependence produce conflicting predictions. Below we examine how these distortions can be expected to differ among three different tasks. Table 1 displays the particular tasks used: ratings of individual alternatives, choices among triples of alternatives, and matching pairs of alternatives to indifference. |INSERT FIGURE 1 AND TABLE 1 ABOUT HERE| Choice involves the selection of one alternative from a set, where each alternative is defined as a collection of different attribute levels. Contrasting choices among triples with the monadic rating and binary matching tasks, our choice task gives agents the most information to process. Further, because a respondent’s goal is to select one, rather than rate or evaluate each Expressing Preferences 8 alternative, there is value in heuristics that facilitate a reasonable decision without too much effort (Wedell & Senter, 1997). For choice, the confluence of a large amount of information with a task that encourages heuristics leads to the expectation that choice will be the most susceptible to both attribute and level simplification. Previous research leads us to predict two specific forms of simplification in choice. First, consistent with the prominence effect, we expect choice to put the greatest weight on the most important attribute. Second, with respect to level focus, we anticipate that choice will focus on negative attribute levels as respondents use the less preferred levels of attributes as a convenient way to screen out or quickly devalue alternatives. The rating task, in contrast to choice or matching, focuses on individual alternatives, and thereby requires the processing of the fewest pieces of information (see Table 1A). Since it generates the lowest information load, it should be the fastest and evoke the least simplification. In particular, people should be able to process more attributes, leading to less attribute focusing. Another differentiating characteristic of ratings is that they are made relative to implicit norms. That is, in choice and matching, the alternatives are directly compared with one another, while in a rating task each alternative is evaluated by itself, with the references to past alternatives largely being carried in memory. Thus, for ratings, the upper and lower bounds of the attribute levels across alternatives offer a frame of reference, while moderate attribute levels provide a natural reference point. This reference dependence combined with loss aversion leads to a prediction of a negativity bias for ratings. Matching between pairs combines the self-anchoring qualities of choice with the relative simplicity of a rating task. Instead of focusing on the value of an alternative, attention is on the value of differences between alternatives. Thus in Table 1C, a person might first evaluate the value of a 20 point difference in snow quality, followed by a 40 percentage point difference in Expressing Preferences 9 the probability of good snow. To simplify the difficult process of valuing cross-attribute differences, we expect respondents to focus first on the salient attributes, giving them greater weight. Another likely attribute bias for matching comes from scale compatibility. We predict that the matching attribute will receive too much emphasis. For example, if price is the matching variable, assessing the dollar value that makes the two alternatives equal in value draws attention to price relative to other attributes. Further, if the respondent anchors on the price given and then insufficiently adjusts for the other attribute differences, then the anchoring and adjustment process leads to an overestimation of the importance of the matching variable (Tversky, Sattath & Slovic, 1988). For example, in the matching task in Table 1C, anchoring on and insufficient adjustment from the price of $300 will result in an increase in the derived value of price. Borcherding, Eppel & von Winterfeldt (1991) demonstrate the distorting power of scale compatibility in a matching context. They compare various attribute importance estimates: “ratio,” “tradeoffs,” and, “swing weights,” all asking for judgments of the value differences between attributes where the matching variable rotates across the different attributes. A fourth method, “Pricing-out,” is similar to our matching task in that price consistently serves as both an attribute and the response scale. Borcherding, Eppel & von Winterfeldt (1991) find that the derived importance of price is 10 times greater for pricing-out compared with the other three methods. The magnitude of this difference suggests that agents in our matching task will put too much weight on to the matching attribute. In contrast to attribute focus, the pairwise nature of the matching task leads us to expect minimal level focus in the matching task. The “concreteness principle” asserts that “information that has to be held in memory, inferred or transformed in any but the simplest ways, will be Expressing Preferences 10 discarded” (Slovic & MacPhillamy, 1974). Applying this principle suggests that people will tend to focus on differences (e.g., the two-hour difference between four and six hours) but ignore the average level, since that takes extra work. To the extent that the information about the general level of the pair is discarded, then the matching task can be expected to show less differential level focusing compared with choice and matching. For that reason, if any bias is likely for a pair task, it is to “over-linearize” value tradeoffs by establishing a constant rate of substitution between a given pair of attributes, regardless of the level of each. In this paper, we present two studies that test these expectations. In the first study, the relationship between the target attribute levels is linear – the value of going from the lowest to the middle level is equivalent to the shift from the middle to the highest level. This linear partworth study provides a test of level and attribute distortions where it is relatively easy for respondents to understand and translate the differential tradeoffs between attributes. In the second study, the relationship of levels within attributes is nonlinear, sometimes increasing and other times decreasing with improvements in an attribute. This nonlinear partworth study tests the generality of our results in a more cognitively demanding context and better discriminates among rival theoretical mechanisms. The Linear Partworths Study Eighty MBA students participated in a study administered entirely by personal computers. We asked respondents to imagine working for a company that selects and markets ski vacations. Bar graphs, such as shown in Figure 2, displayed the values for different levels of attributes of ski vacations. Respondents were then challenged to apply these values to the selection and evaluation of ski trips the company might offer. They received $10 for Expressing Preferences 11 participating and an additional monetary reward of around $5 depending on how accurately their judgments matched the displayed values. The exercise had three parts; first, an introductory and training section; second, the actual choice, rating and matching tasks, and third, a section that assessed subjects’ own attitudes towards the tasks. |INSERT FIGURE 2 ABOUT HERE| Training. To help respondents understand how to apply the company’s values to decisions, they participated in training tasks involving simple choices and matching to indifference. For example, the first training task, shown in Figure 3A, requires a choice between a $300 plan with a “poor” (70) slope quality against a $900 plan with “good” (90) slope quality. In this case, the correct choice is the inexpensive plan, since the length of the bar in Figure 2, reflecting the $900-$300 price difference, is greater than the bar reflecting the poor-good quality difference. We congratulated those making the correct response and moved them to the next choice. An incorrect response evoked an explanation for why the low cost alternative is preferred, saying, “the importance of $300 vs. $900 in total cost is greater than the importance of 70 vs. 90 in quality.” Analogous feedback continued for the next six choice training tasks. Training with nine matching tasks followed. In these exercises, respondents estimated the level in one attribute that would make two alternatives equally valued. For example, they had to estimate the price of a plan with a 90% chance of snow that would equal a $300 plan with a 50% chance of snow (see Figure 3B). After generating their estimates, respondents learned the correct answer ($570) and received praise appropriate to the accuracy of their responses. An answer within 10% elicited a “Very good” response; errors of 10%-20% produced an “OK”, and errors greater than 20% evoked, “That’s not very accurate.” |INSERT FIGURE 3 ABOUT HERE| Expressing Preferences 12 We designed this training program to enable respondents to associate values of attribute levels with the lengths of the lines. However, by providing neither a ruler nor numbers we intentionally made it difficult for respondents to apply a mechanical rule. Further, the subsequent tasks differed on five attributes, rather than on two as in the training tasks, requiring that subjects generalize the idea of compensatory attribute tradeoffs to a far more complex task. The choice and matching tasks were designed to enable respondents to understand the meaning of relatively simple tradeoffs between attributes. There were no training tasks for rating because the rating values change in complex ways as the number of attributes changes. In order to help subjects become acquainted with the rating task with five attributes, we described the best and worst alternatives and indicated that they were the best (rated as 9) and worst (rated as 1). In this way, subjects could understand both the range of products and how they mapped onto the possible responses. Preference elicitation tasks. Following the training session, each subject completed 18 rating, 18 matching and 18 choice judgments corresponding to those shown in Table 1. The rating judgments each described one alternative and asked the subject to assign a rating between 1 (worst) to 9 (best). The matching tasks each had two stages. In the first stage, a respondent chose between two alternatives defined on all attributes but price. In the second stage, the computer defined the price of the less preferred alternative and asked the price of the preferred one for them to be equally valued. Finally, the choice tasks required a simple selection of the best from three alternatives. While performing these tasks the partworths shown in Figure 2 were always in view. Across respondents, we randomized the order of the three tasks. We generated stimuli using related, but differing methods. The rating task came from an 18 x 5 orthogonal array (Addelman, 1962) which permits all main effects for the five attributes Expressing Preferences 13 each at three levels to be estimated with maximum efficiency. For the matching task, we built a pair design from the same array with the following recoding: we replaced all level 1’s with a pair having level 1 on the left and 2 on the right, all level 2’s with a 2 on the left and 3 on the right, and all level 3’s with 3 on the left and a 1 on the right. Finally, for the choice task, we used the following cyclic rule to generate choices: an attribute with level 1 generated three choices with levels 1, 2, 3; that with level 2 generated choices with levels 2, 3, 1 and level 3 translated into a 3, 1, 2. An additional aspect of this study investigated whether different attribute labels would affect the results. As Table 2 shows, the 80 respondents were randomly assigned to one of four conditions with different labels attached to the firstand second-most important attributes. Condition 1 reflects the labels shown in Figure 2, with five attributes, in order of importance being, total cost, slope quality, probability of good snow, travel time and night life. In condition 2, total cost changes position with slope quality. Similarly, in conditions 3 and 4, waiting time at the lift replaces total cost in conditions 1 and 2. Across labeling conditions, the target partworth utilities stayed the same, only the labels changed. Matching was always done in terms of the first (most important) attribute. Somewhat to our surprise, we found that the derived partworths and accuracy differed little despite these substantial labeling differences. Subjects were able to learn the appropriate tradeoffs despite heterogeneous prior orientation to the labels. Thus, for our purposes here, we will treat the labeling conditions as four independent replications of the experiment. To the extent that the results hold across these different labeling conditions, we can feel confident that they hold generally. |INSERT TABLE 2 ABOUT HERE| Expressing Preferences 14 To assess consistent biases among the methods, we estimate coefficients within each of the tasks from data pooled across respondents. These coefficients estimate the values that respondents actually applied within each of the tasks. Biases can be estimated by comparing the derived and target (true) partworths. For the ratings task, a dummy-variable regression estimated an additive model that best predicted these ratings. For matching, a similar regression on level differences (e.g., the difference between high and low snow quality) predicted the value of the differences of the matching variable. Finally, for choice, multinomial logit (Maddala, 1983) produced analogous coefficients that maximized the likelihood of the choices made. The resulting scales then differ with respect to the zero points for each attribute and their general metric. Adding a different constant for each attribute makes no difference for predicting choices since those constants are added to each alternative. Thus for display purposes the lowest (least preferred level) of each attribute is set to zero. Then to put the outputs from the three tasks in the same metric, each is multiplied by a positive constant that best reproduces the target partworths. This affine transformation was determined by a simple regression through the origin of the true against predicted partworths. These transformations of origin and scale permit a focus on the relative partworths in such a way that preserves the rank order of partworths. More important, the transformations do not affect our two critical measures, attribute importance and low-end weight. Results from the Linear Study Figure 4 presents the partworths for the three methods against the target values and Table 3 summarizes the biases of the three tasks with respect to attribute and level focus biases, decision time and attitudes. The tests of significance use the four labeling conditions and three Expressing Preferences 15 tasks as factors in a two-way ANOVA. Throughout, the contrasts between the four labeling conditions are not significant (p > 0.10) and will not be discussed further. Consider first shifts in attribute focus for the most important attribute displayed in Figure 4. For choice, the most important attribute drops in weight by 7%, while attributes with moderate importance gain. Ratings present the same pattern, with a 9% drop in the importance of the most important attribute. By contrast, matching displays a very different pattern, with the most-important attribute increasing by a striking 46%. The drop in weight for the most important attribute is not significant for choices or ratings, in contrast to a significant positive gain for matching. Thus, these results provide no evidence for a prominence effect in choice but substantial evidence for a compatibility effect in matching. Looking for biases within attributes, Figure 4 demonstrates consistent shifts in low-end weight for choice and ratings. This negativity is visually apparent for choice and ratings by the downward curvature indicated, but is hard to detect visually in the case of matching. Indeed as Table 4 indicates, choice overweights the low-end levels by an average of 39%, while ratings overweight them by 18% and matching by only 6%. The biases for both choice and ratings are significantly greater than zero (p < .05), while that for matching is not significant (p>.10). Thus, as predicted, choices, and to a lesser extent ratings put unjustified emphasis on negative information, while the matching task, with its focus on differences between attribute levels, appears less affected by this bias. Finally, we note the time taken and attitudes towards the tasks. Rating is fastest, consuming an average of 11 seconds for each of the 18 judgments. Choice among triples is next at around 19 seconds, followed by matching at 26 seconds. One of the reasons matching takes so long is that it involves two separate tasks; the initial choice among a stimulus pair averages 12 Expressing Preferences 16 seconds, and then matching to indifference takes another 14 seconds. The three tasks also differ with respect to respondent attitudes. Choice is rated easiest; respondents are more confident that they are correct, and the task is seen as most realistic. Ratings are in the middle, and matching performs least well on these perceptions of ease, confidence and realism. |INSERT TABLE 3 AND FIGURE 4 ABOUT HERE| Discussion The results from the first study were quite surprising. The prominence effect suggested that choice would put too much weight on the prominent attribute (relative to the target value) whereas the compatibility effect would put too much weight on the matching attribute, which in our design was also the most important attribute. Extrapolating from past findings, we had expected the prominence bias to be the larger of the two, leading to greater overweighting of the most prominent attribute in choice compared to matching. Instead, we found the opposite--a slight underweighting of the most important attribute for choice and rating along with a substantial overweighting for the matching task. In addition, we found a negativity bias of nearly 40% in choice and nearly 20% in ratings. These differences are large enough to affect the rank ordering of the partworths. If we rank order the expressed partworths for choice and rankings, we find that the low-end partworths have consistently higher rank importance compared with those reflecting the high-end. Furthermore, since no significant negativity bias is apparent in the matching judgments, it is unlikely that the negativity bias for choice and ratings could have arisen from an internal re-evaluation of the input data. Instead, the negativity bias appears to reflect the ways the given values are expressed Expressing Preferences 17 in the tasks. In choices and ratings, people act as if they automatically treat differences on the low end of each attribute as mattering more than comparable differences on the high end, whereas in matching the value difference is quite independent of the level. This lack of a negativity bias in matching may be due to two factors. First, since the target partworths were linear, matching may simply be better at approximating these true values. Alternatively, by focusing on differences, matching may be biased towards the linear, equal spacing of level differences regardless of the true level differences. To discriminate between these two accounts, we designed a second study with attributes whose target partworths either displayed negativity (decreasing returns to a fixed improvement in the variable) or positivity (increasing returns). If matching is biased towards producing equally spaced partworths, this bias should be apparent in these non-linear conditions. Having target attributes whose partworths show both increasing and decreasing returns offers a further theoretical advantage. It enables us to distinguish between a simple negativity bias and the Klein and Bither (1987) utility-dependent cutoff mechanism. Under negativity, the lowest levels of an attribute will increase in importance. However, under a utility-dependent model, only the larger utility differences (whether between positive or negative levels of an attribute) will be inflated. Thus, attributes with increasing returns (such as snow probability in Figure 1A), should see greater curvature if utility-dependent focusing is correct (Figure 1D), but should see that upwards curvature moderated if negativity is more salient (Figure 1C). Non-linear Tradeoffs Study The experimental procedure was similar to that of the linear study except for three changes. First, rather than rotate labels, all subjects experienced the labeling condition that had Expressing Preferences 18 price as the most important attribute. Second, two different curvatures of target partworths were manipulated between participants. Third, since the utility structure underlying this curvature was more complex, the training expanded from 7 to 11 choices and from 9 to 12 matching tasks. Sixty MBAs were randomly assigned to one of two conditions. Condition 1, placed greater weight on the negative levels of the second and fifth attribute and less weight on the negative levels of the third and fourth attribute, as shown in Figure 5A. Condition 2, shown in Figure 5B, reversed the curvature of condition 1 for each attribute, except the first, which was linear in both conditions. The conditions were designed so that the average of the two conditions was equivalent to the linear partworths in the earlier study. |INSERT FIGURES 5A and 5B ABOUT HERE| Results from the Nonlinear Study Table 4 summarizes the bias and attitude statistics, while Figure 6 graphs the derived partworths averaged across the two initial curvature conditions. These results are remarkably arallel to those in the linear study. We consider first biases in attribute focus, followed by those related to level focus. |INSERT FIGURE 6 AND TABLE 4 ABOUT HERE| In terms of attribute focus, Table 4 shows that choice and ratings again give less weight to the first attribute than is appropriate, while matching again gives it more. Both choice and ratings display an attribute focus bias that is in a direction opposite to that of a prominence effect, but not significantly so. In contrast, the matching task displays a strong and significant scale compatibility bias that overvalues the matching attribute. This 23% overvaluation of the Expressing Preferences 19 matching variable may be substantially less than the 46% found in linear study, but still remains a substantial problem for the matching task. Thus far we have emphasized the weight given to the most important attribute. However, given the unanticipated lack of a prominence effect in the choice data, it is appropriate to examine the weights given to the less important attributes as well. Defining attribute weight as the utility range for each attribute divided by the sum of those ranges, Figure 7 graphs target importance weights, against expressed attribute importance weights for the three tasks. The diagonal shows where weights would be if they were perfectly expressed. Both panels in Figure 7 display the aforementioned overweighting of the matching variable and a somewhat smaller underweighting for choice and ratings. The new insight from these graphs is that for both choice and ratings the position of the middle attributes above the diagonal indicates that they are given more weight than is appropriate. An equivalent way to interpret this result is that the three most important attributes are weighted more equally than is justified. That is, if we compute the slopes between the expressed and the target weights for the three most important attributes, they average m = .68 for the linear and m = .57 for the nonlinear study. Both are significantly (p < 0.05) lower than the 1.0 they would be if attribute weights were correctly expressed. This equal-weight bias could be driven by the fact that the true values for each attribute are prominently displayed in our agent task, making it less reasonable to focus on just one attribute and encouraging simplification by giving equal weight to each attribute considered. The equal weight bias generalizes a result found by Russo & Dosher (1983). They found an equal weighting bias in paired comparisons, whereas we show it also occurs in choices among triples and for monadic ratings. |INSERT FIGURE 7 ABOUT HERE| Expressing Preferences 20 Turning attention to level focus, Figure 6 shows that choices and ratings again produce a visually apparent negativity bias. Table 4 shows that low-end weight increases an average of 30% for choices and 21% for ratings, in contrast to a non-significant 5% decrease for matching. Thus, the nonlinear study replicates the linear study, showing that choice and ratings produce significant negativity, while matching does not. The partworths shown in Figure 6 are appropriate for estimating the general tendency to give too much weight to low-end levels, but it is important to recall that they reflect averages across target conditions that differ in their low-end levels. Figure 8 groups attributes that share the same target curvature. The left panel shows attributes with positive targets, reflecting likelihood of excellent snow and travel time in condition 1 and ski slope and night life in condition 2 (see Figure 5). The right panel displays the curvature for these same attributes with negative targets. The contrast in expressed curvature between positive and negative target attributes is important because it permits a test of negativity against utility dependent distortions (Klein & Bither, 1987). Recall that under utility dependence large differences are given greater weight, while smaller ones are given less weight. In terms of weight given to low-end levels, utility dependence predicts that the small low-end weight of the positive target will become even smaller. By contrast, negativity predicts an increase in low-end weights in all conditions. If both processes operate, we would expect to see more moderate biases for the positive targets, because utility dependence would cancel negativity, but greater biases given negative targets because both processes operate to increase negativity. As Figure 8 shows, the reverse occurs. With positive attributes that place minimal weight on the low-end (29%), both choice and rating display appropriate negative bias. By contrast, Expressing Preferences 21 when the target already has strong negativity (72%), these biases are moderated or even reversed. Put differently, the general negativity bias for choices and ratings comes largely from the condition in which the initial target has increasing returns, a result that offers virtually no support for the utility dependence model. Figure 8 is also useful in contrasting the sensitivity of the three tasks to different target conditions. For example, matching is quite accurate in tracking the correct curvature, while choice appears to display consistent negativity. Ratings, by contrast, show the least impact from target curvature. The curvature expressed by ratings is muted, differing very little across the two conditions. Finally, Table 4 gives other measures of differences between the tasks. For the more complex study, decision time increased by 29 seconds for choice (19 48 seconds) and by 28 seconds for matching (25 53 seconds) but by only 9 seconds for the already fast ratings (11 20 seconds). These differences suggest that extra time may be more valuable in choice and matching compared to ratings. In choice and matching it is possible to know what makes a good decision, whereas for ratings, additional effort may not be expended due to uncertainty projecting values onto an arbitrary rating scale. In terms of attitude towards the task, choice still dominates in being perceived as the most realistic and remains the easiest task about which respondents feel most confident, but matching now surpasses it in terms of being perceived as the most interesting. Discussion and Conclusions The purpose of this paper has been to examine the impact of task on the degree to which agents can consistently express the known values of the principal whose interests they represent. Expressing Preferences 22 Such agent tasks are important not only because they allow us to have veridical measurements of judgment accuracy, but also because there are many contexts in which decision makers express values of others through choices, ratings, or matching judgments. While our tasks are admittedly simple and somewhat stylized, the biases evident in these simple cases could portend even greater biases in cases where policies are not as well defined. After all, our experiments minimize distortions from understanding or learning values, and focus on distortions arising from the final expression of values in choices, ratings and matching. Possible alternative explanations Before examining the implications of these results, it is important to consider whether they could have been generated by other mechanisms. Specifically, it is important to consider whether they could have arisen either through a rank order transformation of the original partworths or through noisiness in our subjects’ responses. The rank order explanation assumes that respondents encode only the rank order information from the original partworths bar graphs. Under this assumption the expressed and true partworths should then be related only by their rank orders. However, an examination of the rank order of the target against expressed partworths reveals consistent, rather than random deviations from the initial rank orders. In particular, expressed orderings of partworths for choices and rankings favor negativity in more than 80% of the cases, a result incompatible with an account based on a rank order transformation of the original partworths. A second hypothesis that initially seemed feasible is that these results could have been an outcome of noise (variability) either within or between subjects. To test that possibility we ran a series of analyses simulating responses with different levels of noise. In the noisy conditions, differences between the partworths became muted and less consistent, but we were unable with Expressing Preferences 23 noise either to produce negativity or the pattern of attribute weights we found. Of course, differential variability applied to different attributes or attribute levels could produce our results. For example, we could simulate negativity by injecting greater precision into the evaluation of the low-end attributes. However, it would be difficult, and probably not productive, to pit such a noise-based model against a simply applying greater weight to these attributes.
منابع مشابه
Expressing preferences declaratively in logic-based agent languages
In this paper we present an approach to introducing preferences among actions in logic-based agent-oriented languages. These preferences are expressed in the body of rules (i.e., they are local to the rule where they are defined). To the best of our knowledge, no similar approach has been proposed before, and cannot be easily simulated by means of preferences expressed in the head of rules, whi...
متن کاملINCOMPLETE INTERVAL-VALUED HESITANT FUZZY PREFERENCE RELATIONS IN DECISION MAKING
In this article, we propose a method to deal with incomplete interval-valuedhesitant fuzzy preference relations. For this purpose, an additivetransitivity inspired technique for interval-valued hesitant fuzzypreference relations is formulated which assists in estimating missingpreferences. First of all, we introduce a condition for decision makersproviding incomplete information. Decision maker...
متن کاملComparison of p300 in risk-seeker and risk-averse people during simple gambling task
Risk preference, the degree of tendency to take risk, has a fundamental role at individual and social health and is divided to risk seeker and risk averse. Therefore, the study of neural corelates of risk preferences is essential at the field of psychology and psychiatry. The current study aimed to examine and compare an ERP component named P300 between subjects with different risk preferences....
متن کاملDelegation with a Reciprocal Agent
We consider a model in which a principal may delegate the choice of a project to a better informed agent. The preferences of the agent and the principal about which project should be undertaken can be discordant. Moreover, the agent benefits from being granted more discretion in the project choice and may be motivated by reciprocity. We find that the impact of the agent's reciprocity on the dis...
متن کاملVers l’utilisation de relations de préférence pour le filtrage collaboratif
Collaborative filtering based recommender systems exploit users preferences about items to provide recommendations to these users. These preferences are generally ratings. However, choosing a rating is not an easy task for any user ; the rating value may be influenced by many factors and the ratings are thus not completely trustworthy. In this article, we propose a new approach of expressing pr...
متن کامل