On Simpson’s Paradox for Discrete Lifetime Distributions

نویسنده

  • Daniel Lebowitz
چکیده

In probability and statistics, Simpson’s paradox is an apparent paradox in which a trend is present in different groups, but is reversed when the groups are combined. Joel Cohen (1986) has shown that continuously distributed lifetimes can never have a Simpson’s paradox. We investigate the same question for discrete random variables to see if a Simpson’s paradox is possible. With discrete random variables, we first look at those that have equally spaced values and show that Simpson’s paradox does not occur. Next, when observing the discrete lifetimes that are unequally space with identical supports, we similarly discover that a Simpson’s paradox still cannot occur. When the two random variables do not have identical supports, which allows for the flexibility to compare a broad range of different random variables, we discover that a Simpson’s paradox can occur. 1 Halley’s Life Table In 1662, John Graunt developed one of the first life tables. He gathered his data from London’s bills of mortality, but unfortunately the data lacked consistency, since there was no set structure of recording the births and deaths. Since Graunt’s data was so unorganized and incomplete, he had no other choice but to guess all of the unknown entries. Another characteristic that made Graunt’s data less desirable is that London’s population growth was largely affected by migration, Halley (1693). Casper Neumann (1648-1715), a German minister in Breslau, Silesia, had possession of complete records of births and the ages of deaths of people from Breslau from the years 1687-1691. Not only was the data accurate and complete, unlike the data Graunt used in 1662, but it also had several properties that made the data favorable for the uses of a life table. These characteristics included conclusions that the number of births and deaths were approximately equivalent, there was very little migration, and death rates for all age remained approximately constant. With these characteristics, it is fair to assume that Breslau had a near stationary population. Neumann sent these demographic records to Gottfried Leibniz, who then sent them to the Royal Society in London. The Royal Society asked Edmund Halley (1656-1742) to analyze the data. Halley published his analysis in 1693 in the Philosophical Transactions, Ciecka (2008). Before discussing Halley’s uses of a life table, it is necessary to define a few variables. For this we employ standard actuarial notation. The youngest age at which everyone in the population has died is denoted by ω, the population at age x ∈ {0, 1, 2, . . . , ω} is lx, and Lx = 0.5(lx + lx+1), Ciecka (2008). The table on the next page was produced by Halley and represents the combined male and female survivors. Halley adjusted and smoothed the data, including L0 = 1000, where the actual value from the Breslau data would have resulted in L0 = 0.5(1238 + 890) = 1064. It seems that this specific rounded value was for the 1 convenience of having a radix of 1000, Ciecka (2008). Age x Lx−1 Age x Lx−1 Age x Lx−1 Age x Lx−1 1 1000 23 579 45 397 67 172 2 855 24 573 46 387 68 162 3 798 25 567 47 377 69 152 4 760 26 560 48 367 70 142 5 732 27 553 49 357 71 131 6 710 28 546 50 346 72 120 7 692 29 539 51 335 73 109 8 680 30 531 52 324 74 98 9 670 31 523 53 313 75 88 10 661 32 515 54 302 76 78 11 653 33 507 55 292 77 68 12 646 34 499 56 282 78 58 13 640 35 490 57 272 79 49 14 634 36 481 58 262 80 41 15 628 37 472 59 252 81 34 16 622 38 463 60 242 82 28 17 616 39 454 61 232 83 23 18 610 40 445 62 222 84 20 19 604 41 436 63 212 20 598 42 427 64 202 85-100 107 21 592 43 417 65 192 22 586 44 407 66 182 Total 34000 The following observations, and their explanations, were obtained by Halley and published in his classic papers, Ciecka (2008). 1. ”Proportion of men able to bear arms” (ages 18 to 56). Assuming that half of the population consists of men, Halley derived L17 + L18 + · · ·+ L55 2 ∗ 34000 = 0.265 where the population of Breslau was 34,000. 2. The odds of survival between two ages. This compares the number of deaths to the number that survived between two ages Lx+t Lx − Lx+t Ex: The odds of surviving through the teenage years is L19 L12 − L19 = 598 640− 598 = 598 : 42 = 14.2 : 1 3. The age where it is an ”even wager” that a person at a current age would survive or die.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ubiquity of the Simpson’s Paradox

Correspondence: [email protected] Department of Mathematics and Statistics of McMaster University, 1280 Main Street West, Hamilton, (ON) L8S-4K1, Canada Abstract The Simpson’s Paradox is the phenomenon that appears in some datasets, where subgroups with a common trend (say, all negative trend) show the reverse trend when they are aggregated (say, positive trend). Even if this issue has ...

متن کامل

How Likely is Simpson's Paradox in Path Models?

Simpson’s paradox is a phenomenon arising from multivariate statistical analyses that often leads to paradoxical conclusions; in the field of e-collaboration as well as many other fields where multivariate methods are employed. We derive a general inequality for the occurrence of Simpson’s paradox in path models with or without latent variables. The inequality is then used to estimate the proba...

متن کامل

Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data

Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson’s paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson’s...

متن کامل

Comment: Understanding Simpson’s Paradox

I thank the editor, Ronald Christensen, for the opportunity to discuss this important topic and to comment on the article by Armistead. Simpson’s paradox is often presented as a compelling demonstration of why we need statistics education in our schools. It is a reminder of how easy it is to fall into a web of paradoxical conclusions when relying solely on intuition, unaided by rigorous statist...

متن کامل

Simpson’s Paradox and the implications for medical trials

This paper describes Simpson’s paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly points to one conclusion (such as ‘drug is effective’) for every possible combination of the variable states, but when a previously unobserved confounding variable is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013