1st Two Laws Sept 2013
نویسنده
چکیده
I argue here that both the 1 and 2 laws of thermodynamics, generally understood to be quintessentially physical in nature, can be equally well described as being about certain types of information without the need to invoke physical manifestations for information. In particular, I show that the statistician’s familiar likelihood principle is a general conservation principle on a par with the 1 law, and that likelihood itself involves a form of irrecoverable information loss that can be expressed in the form of (one version of) the 2 law. Each of these principles involves a particular type of information, and requires its own form of bookkeeping to properly account for information accumulation. I illustrate both sets of books with a simple coin-tossing (binomial) experiment. In thermodynamics, absolute temperature T is the link that relates energy-based and entropy-based bookkeeping systems. I consider the information-based analogue of this link, denoted here as E, and show that E has a meaningful interpretation in its own right in connection with statistical inference. These results contribute to a growing body of theory at the intersection of thermodynamics, information theory and statistical inference, and suggest a novel framework in which E itself for the first time plays a starring role. Introduction As is well known, important connections exist between statistical mechanics and information theory. It has been widely recognized for some time that the concepts of entropy developed by Gibbs and others have, at least under some well circumscribed conditions, the same form as Shannon's [1] measure of average information, or Shannon entropy. The information-entropy connection has been further developed in connection with Gibbs' paradox [2] and Maxwell's demon [3] and elsewhere. The link between physics and information theory can be made by treating information in its physical embodiment, that is, by recognizing that the extraction of information requires physical action, and then describing that action in familiar physical terms [4]. This is one way to reconcile the observation of identical bits of mathematics governing two topics – physics and information theory that seem to be arXiv September 2013 2 very distinct metaphysically. Jaynes [5] seemed to be getting at something more abstract, namely, that certain aspects of physics ought themselves to be understood in less physical terms. In particular, he argued that it is the probabilistic aspects of entropy, rather than physical aspects of matter, that give rise to the phenomena encapsulated by the second law of thermodynamics (henceforth simply called the 2 Law). (See also [6,7].) More recently, Duncan and Semura [8,9] have argued that the most fundamental form of the 2 Law is wholly information-based. They too, however, justify this statement of the 2 Law in part by assuming physical embodiment for information loss. Here I postulate the utility of decoupling Duncan & Semura’s statement of the 2 Law from its physical embodiment, and simultaneously extend this type of reasoning to consider thermodynamics as a whole from a purely informational (non-physical) point of view. I add to their statement of the 2 Law an information-based version of the 1 Law of Thermodynamics (henceforth simply the 1 Law), and I develop a methodological framework for considering the dynamics of certain kinds of information flow adherent to these laws. Insofar as this new framework is coherent, this suggests that the dichotomy between what is physical and what exists purely in the realm of information is an unnecessary one from the start, not because information must have physical embodiment, but because as a matter of mathematics these two representations physical and informational – are identical, at least over certain domains, without the need to posit physical existence for any of the constituent terms. It is worth noting up front that in other information-theoretic treatments involving elements of thermodynamics, (the analogue of) temperature itself plays little or no role. E.g., in the work of authors like Caticha [6] and Bialek [10] employing maximum entropy as an inferential device, the Lagrange multiplier as it appears in the Boltzmann distribution is given no particular interpretation beyond its role as a calculation device. This is in no way a criticism of this very interesting body of work, but any attempt to recapitulate thermodynamics as a whole on a purely information-based footing must include a salient role for an informational equivalent to absolute temperature T: after all, it is T that puts the “thermo” in “thermodynamics.” In a previous paper [11], my coauthors and I set out to derive an absolute scale for the measurement of statistical evidence. We posited that this could be accomplished by harnessing the mathematical underpinnings of T, and constructed a proof of principle for a particularly simple (binomial) statistical model. The current paper traverses some of the same terrain, however, with a very different orientation: while [11] started with basic physical quantities and relied heavily on analogy to link these to informational counterparts, here I build up an “information dynamics” framework in purely arXiv September 2013 3 mathematical terms. Again, however, in the current treatment the concept of statistical evidence plays a central role, mathematically equivalent to the central role of temperature in thermodynamics. Results Because the notion of statistical evidence ends up playing so central a role, I begin with (i) a brief account of the sense in which the term “evidence” is invoked throughout the paper. I then develop the general framework in three steps. In (ii) I derive a formal (purely mathematical) connection between the familiar likelihood principle in statistics and the 1 Law. In (iii) I similarly relate a principle concerning information loss inherent in the use of likelihood to (a form of) the 2 Law. Then in (iv) I consider the link between these two principles. If we assume that this link has the same form in the new informational framework as it does in thermodynamics, then we arrive at the very surprising conclusion that thermodynamic T has a purely information-based interpretation in addition to its interpretation as temperature, viz., as the strength of statistical evidence, in a sense to be made clear. Throughout, I develop the theory with respect to a single, and particularly simple, binomial statistical set up. In part this is simply to facilitate the exposition through concrete examples and calculations. But in part, aspects of this set up are fundamental to the framework as currently developed, a point I return to in the Discussion. Details of calculations used to generate the figures are given in Methods, below. (i) Properties of evidence In much of the literature relating information theory, thermodynamics and statistical inference, the term “evidence” is used casually and without explicit definition. This is true of work that is otherwise extremely rigorous, thus it reflects the fact that evidence per se is simply not being considered as a central concept in this literature. When the term is defined, it is sometimes treated as synonymous with “data,” or in other instances used to refer to the (marginal) probability of the data (the normalizing constant in the Boltzmann distribution). Here I am using the term in neither of those senses, but in a sense that is more closely related to some statistical literature (see, e.g., [12]). Therefore, I begin with an account of evidence in the sense in which it plays a role in the current framework. I do so by considering the behavior of evidence informally, in a simple setting in which our intuitions are clear. This allows us to enumerate key properties of what we mean by evidence, properties that any formal treatment of evidence would need to recapitulate. Consider a series of n independent coin tosses of which x land heads (H). And consider the two hypotheses “coin is biased towards tails (H)” versus “coin is fair” (i.e., the coin lands H and H with equal probabilities; I use the notation H to designate tails, since the more obvious choice “T” is already arXiv September 2013 4 being used for absolute temperature). What follows are some thought experiments appealing to the reader’s statistical intuition. Suppose that on repeated tosses the coin consistently lands H (approximately) 5% of the time. Clearly as the number of tosses n increases, if x/n remains around 5%, the evidence that the coin is biased increases. Now consider a fixed number of tosses n, but allow x/n to increase from 5% upwards. Here the evidence in favor of bias clearly decreases as the proportion x/n increases in the direction of 1⁄2. These two features together entail a third, somewhat more abstract property. Suppose now that we hold the evidence constant (without defining it or saying how to calculate it). If x/n starts at 0.05, what must happen as n increases, in order to prevent the evidence itself from increasing? If the previous two properties hold, then it follows that in this scenario, x/n must increase; otherwise the evidence would increase as n increases rather than remaining constant. Thus the three quantities n, x, and evidence e, enter into an equation of state, in which holding any one of the three constant while allowing a second to change necessitates a compensatory change in the third. Here e itself is simply defined as the third fundamental entity in the set. I note two additional and very important properties of e up front. First, for given n, as x/n increases from some very small value towards 1⁄2, the evidence for “bias” at first decreases, but the evidence remains in favor of bias up to some particular value of x/n. Beyond that value, however, as x/n continues to approach 1⁄2 from the left, the evidence switches to favoring “no bias.” I refer to the value of x/n at which this switch occurs as the “transition point” (TrP). To the right of the TrP, as x/n continues to increase beyond this point up to x = n/2, the evidence for “no bias” increases. Figure 1 illustrates this pattern. (Statistical intuition also suggests that TrP(E) itself ought to move towards x = n/2 as n increases, so that even a small deviation from x = n/2 would correspond to evidence against θ= 1⁄2 for very large n.) A mirror-image set of properties occur as x/n increases to the right of θ= 1⁄2. In what follows, I will focus on the one-sided hypothesis contrast “coin is biased towards H” (θ< 1⁄2) versus “coin is fair,” so that we need not consider θ> 1⁄2. Finally, note that the same quantity of new data (n, x) seems to have a smaller impact on our intuitive assessment of the evidence the larger is the starting value of n, or equivalently, the stronger the evidence is before consideration of the new data. E.g., suppose we have already tossed the coin 1,000 times and observed x = 0. We will all agree that this is overwhelming evidence of a biased coin. If we toss the coin an additional 5 times and observe 0 H, the new data (5, 0) change the strength of the evidence hardly at all. However, had we started with just 2 tosses with 0 H, then added the same new data (5, 0), the evidence in favor of bias would have changed quite a bit. The initial set (2, 0) gives only very slight arXiv September 2013 5 evidence for bias, but by the time we have observed (7, 0), we would be far more suspicious that the coin is not fair. Despite the purely qualitative nature of this example, it illustrates the key point that evidence is not equivalent to or even inherent in data per se: the evidence conveyed by new data depends on the context in which they are observed. Table 1 summarizes these key properties of evidence. They play no particular role in §(ii) and §(iii) below, but they become critically important in §(iv). In the mean time, they serve to motivate the use of the qualifier “evidential” in some of the following text. (ii) The Likelihood Principle and the 1 Law of Thermodynamics The familiar likelihood principle in statistics states that all of the information conveyed by a set of data regarding a parameter (vector) is contained in the likelihood (see, e.g., [12]). Because below I will need to distinguish this type of information from another, I refer to the information conveyed by the likelihood as “evidential information.” The other type of information, which I’ll call “combinatoric information,” is introduced in §(iii). To be concrete, again consider an experiment comprising n independent coin tosses with x = the number of H, and P(H) = θ. The likelihood is defined as L θ n, x ∝ P! x θ = kθ! 1− θ !!! , Eq 1 where k is an arbitrary scaling constant. We are interested in comparing the hypotheses θ < 1⁄2 (coin is biased toward H) vs. θ = 1⁄2 (coin is fair). Then the likelihood principle further implies that all of the information distinguishing these hypothesis for given data (n, x) is contained in the likelihood ratio (LR) between these two hypotheses: LR θ|n, x = !" !(!!!)(!!!) ! !! ! = 2!θ! 1− θ !!! . Eq 2 I will refer to this extrapolation of the likelihood principle from the likelihood itself to the LR as the extended likelihood principle; again, see, e.g., [12]. Note that Eq 2 is frequently treated in the statistical literature as expressing the evidence for (or against) “coin is biased” vs. “coin is fair.” I return to this interpretation of the LR in the Discussion. Now consider two sets of data. Let D1 comprise n1 tosses of which x1 have landed H. Denote the graph of the corresponding LR(θ|n,x), plotted over θ = [0, 1⁄2] on the x-axis, as LR(A). We are interested arXiv September 2013 6 in the effects on LR(A) of a second data set, D2 = (n2, x2). Let the corresponding graph, considering both D1 and D2, be LR(B). The consideration of D2 results in a transformation of the graph from its initial state A to a new state B. Even prior to considering the nature or mechanism of that transformation, one requirement is that it must appropriately reflect the effects of the new data and nothing but the effects of the new data. Otherwise, the transformation would lead to a violation of the extended likelihood principle. Thus we will need to characterize transformations of the LR graph in terms of an underlying state variable, which I’ll denote by U. Requiring U to be a state variable means that, for any given set of data (n, x), U must depend only on the LR for given data, and not, for example, on anything related to the history of data collection. I now seek to formally characterize the change in U from state A to state B, or ΔU, corresponding to the change ΔLR. The binomial LR graph for given n, x can be uniquely specified in terms of two quantities, but there is leeway regarding which two we choose. For example, we could use (n, x) itself, but this turns out to be not particularly revealing in the current context. Alternatively, we can uniquely specify the LR in terms of properties of the LR graph. Here I use the area under the LR curve, denoted by V, and a second quantity denoted by P. P is chosen such that the pair (V, P) uniquely determines a particular LR graph, and such that for given U, P is inversely proportional to V. (The existence of a quantity P fulfilling these conditions in the binomial case is shown in [11]. Note that throughout I treat V and P as continuous rather than discrete; see Methods.) For the moment there is no special significance to the choice of variable names for these quantities. However, as a mnemonic device, the reader may consider them as counterparts of volume (V) and pressure (P), respectively, while U is the counterpart of internal energy. Note that the proportionality requirement for P enforces an “ideal gas” relationship to V. To consider a concrete example of the bookkeeping involved in quantifying ΔU, suppose that D1 = (n,x) = (4,1) and D2 = (2,1). Thus LR(A) corresponds to (4, 1) while LR(B) corresponds to (6, 2). There are two possibilities regarding how we may have gotten from LR(A) to LR(B): either the 5 toss landed H and the 6 toss landed H, or, the 5 toss landed H and the 6 toss landed H. That is, the transformation from LR(A) to LR(B) could go through the intermediate states (5,1) or (5,2). Call these two possibilities Path 1 and Path 2, respectively. Figure 2 shows the two paths plotted both in terms of the LR and on a classical PV diagram. Since both paths begin and end with identical LR graphs, our state variable U must change by the same amount in both cases. Thus we would like to find a function of V, P such that dU(V, P) is a perfect differential. By stipulation, VP ∝ U, so that dU = c(VdP + PdV) has the correct form, where c is the arXiv September 2013 7 constant of proportionality. Define the quantities W = PdV and Q = cVdP + (c+1)PdV. Note that both W and Q are readily verified to be path-dependent, that is, their values will be different for the two transformations. Then we have dU = c VdP + PdV = cVdP + c + 1 PdV − PdV = Q −W. Eq 3 Beyond the choice of variable names, nothing whatsoever in the derivation makes any reference to physical quantities. Here W is simply one aspect of the transformation from LR(A) to LR(B), and Q is the compensatory quantity required to express changes in U (a state variable) in terms of W (a pathdependent variable). Then Eq 3 tells us that the amount of evidential information “received” by the LR as a result of new data is (Q-W). Armed with this new notation, we now have a concise way to express the principle that all of the evidential information is contained in the LR: ΔU = Q W, or equivalently, Q = ΔU + W. The extended likelihood principle can therefore be reformulated to state that the variation in evidential information of a system during any transformation (ΔU) is equal to the amount of evidential information that the system receives from new data (Q – W). (This statement is paraphrased from Fermi’s general statement of the 1 Law [13], p. 11.) Thus the extended likelihood principle, which requires that transformations of the LR graph (up to allowable scale transformations) reflect all and only changes in the data, turns out to be a fundamental conservation principle on a par with the 1 Law. (iii) Information loss and the 2 Law of Thermodynamics I continue with the coin-tossing example. But whereas §(ii) dealt with what I called “evidential information,” I now consider the other type of information, or “combinatoric information.” In a sequence of n independent tosses of a single coin, each toss can land either H or H. Hence if we know the actual sequence of tosses we have I!"# = ln 2! = n ln 2 Eq 4 units of information, corresponding to the logarithm of the number of possible sequences. (For present purposes the base of the logarithm, which further defines the units of information, is unimportant. Note that Eq 4 is more commonly thought of as the amount of uncertainty or entropy associated with having no knowledge of the sequence; but it will be more convenient here to consider this as the information corresponding to complete knowledge of the sequence. See also [1].) Obviously ISEQ increases linearly arXiv September 2013 8 in n. This makes sense: all other things being equal, the amount of information goes up with the number of tosses. Now suppose that, rather than knowing the full sequence of H and H, we know only the number x of H (hence the number (n-x) of H). In this case, we know only that the actual sequence was one of x possibilities, and the amount of information we have is I !,! = ln n x < I!"#. Eq 5 The change in information in going from ISEQ to I(n,x), written with a negative sign to indicate lost information, is −ΔI ≜ I!"# − I(!,!). Eq 6 Clearly the behavior of −ΔI is a function of both x and n. It is not guaranteed to increase as n increases, because the inherent increase in information as n increases is offset by a corresponding increase in the amount of information lost in going from ISEQ to I(n,x), which is further mediated by the value of x/n. Thus the underlying dynamics here requires an accounting system that simultaneously takes into account changes in n and changes in x. I would argue that this particular type of combinatoric information loss is a ubiquitous feature of abstract reasoning, or the generalization from “raw” information to explanatory principles. For instance, suppose that what we are really interested in is information about the probability θ that the coin lands H. In order to extract this information about θ, we need to rewrite the sequence of H and H as an expression in terms of θ. This entails a compression of the original sequence into the expression θ(1-θ), or in logarithmic terms, g(θ) = x ln θ + (n-‐x) ln (1-‐θ) = ln L(θ | n,x), Eq 7 which is simply the ordinary ln likelihood for given data, i.e., the ln of Eq 1. (Thus information about θ appears to return us to the “evidential information” conveyed by the likelihood as discussed in the previous section, while the current section deals with some other type of information; see e.g., [14] for a related distinction.) But the combinatoric information associated with Eq 7 is I(n,x) < ISEQ, because once arXiv September 2013 9 the sequence of H, H is reduced to the form of Eq 4, the full sequence can no longer be reconstructed. The lost combinatoric information has been permanently erased. This illustrates a very general principle: data reduction for purposes of gleaning information about underlying parameters always comes at a price. Any process that extracts information regarding an underlying parameter (vector) from a set of data entails irrecoverable loss of information, or the permanent erasure of some of the information associated with the full data prior to compression. This is in essence the form of the 2 Law proposed by [8], to which I have added a specific context in which information is routinely erased, a context that makes it clear that we can consider irrevocable information loss in accordance with the 2 Law without having to postulate physical existence for the information or for the information lost through data reduction. (iv) The information-theoretic analogue of T Finally, I consider the connection between transformations of the LR graph characterized in terms of changes in evidential information (ΔU from §(ii)) and changes in combinatoric information (ΔI from §(iii)). A picture emerges of a complex information-dynamic system. (Thinking in terms of dynamics seems useful here, even if transformations are not considered as functions of time.) All other things being equal, the more data we have, the more evidential information we have. But this information gain is mediated by a corresponding increase in the amount of combinatorial information erased in the process, which is a function (in the binomial case) of both n and x. At the same time, while the amount of evidential information is increasing, evidence itself is not necessarily increasing, because it again is a function of both n and x. E.g., evidence in favor of the hypothesis that the coin is biased could go down going, say, from (n = 4, x = 0) to (n = 8, x = 4), despite the doubling of the sample size. Both evidential information and combinatoric information are in play, and this suggests that we need a way to link the bookkeeping expressed in terms of combinatoric information with the bookkeeping expressed in terms of evidential information. The only remaining step, then, is to articulate this link. Following [8], I do this by introducing a quantity E, the analogue of thermodynamic temperature T, as the proportionality factor linking the two sets of books. That is, I postulate that E relates U, W and ΔI, as these quantities are defined above in terms of information, through the same equation used to relate T to the corresponding quantities in thermodynamics. Thus following [8] but writing E instead of T we have, E = !!!! = ! !!!! Eq 8 arXiv September 2013 10 where k is a constant (not necessarily equal to Boltzmann’s constant). Eq 8 relates the incoming evidential information transferred in the form of Q to the net loss of combinatoric information. But what reason do we have for thinking that the relationship expressed between the two sides of Eq 8 has any useful meaning, given that we are not interpreting W as mechanical work or Q as physical heat? In the current context, E plays a purely abstract role, as the link relating evidential and combinatoric information. Of course, Kelvin’s derivation of T was also quite abstract, arguably a matter more of calculus than physics, and historically predating our understanding of thermal energy in the terms of statistical mechanics. (For a fascinating account of exactly how difficult it was, and remains to this day, to directly relate T to actual physical phenomena, see [15], especially Chapter 4.) Notwithstanding, T itself has a critically important physical interpretation in the theory of thermodynamics. It behooves us, therefore, to seek a corresponding information-based interpretation of E. Figure 3 illustrates the behavior of E, as defined by Eq 8, as a function of (n, x). It is readily confirmed that in its behavior, E recapitulates the principal properties ascribed to evidence in §(i) above (per Table 1). Thus E can be understood to be the evidence measured on an absolute scale. (That is, E is the analogue of T, while the quantity e in §(i) above is the analogue of thermodynamic t, or temperature measured on an arbitrary scale. See also [11] for additional detail on E as a formal measure of evidence.) It would appear, then, that a complete alignment of thermodynamics with this new purely informationbased framework must entail both the new interpretations given above for the 1 and 2 Laws of thermodynamics, as well as pride of place for this new quantity E, which appears in the end as the thing the new theory is actually about: the evidence. I want to stress that in terms of the theory as it has been derived here, the relationship between E and evidence is an empirical discovery. E was introduced as the link connecting two types of information bookkeeping, assuming a system governed by purely information-based versions of the 1 and 2 Laws, and under the postulate that E would have the same mathematical form as its analog T in thermodynamics. E need not have turned out to have any recognizable behavior or meaningful interpretation. The fact that it turns out to have an interpretation as evidence, a concept so seemingly fundamental to inference, strongly supports the idea that thermodynamics is serving here as more than mere analogy. The mathematical underpinnings of thermodynamics appear to relate to the “dynamics” of information flow just as directly as they relate to heat flow.
منابع مشابه
Timing Ovulation in Ewes Treated with Ovsynch Different Times of PGF2 αInjection during the Breeding Season
This study was carried out at Sakha Animal Production Research Station, during the period from Oct., 2009 to Sept., 2010. Forty Rahmani ewes were divided into three treatment groups: the 1st, 2nd and 3rd treatment groups were intramuscularly injected (Day 0) with 1 mL GnRH analogue followed by an intramuscular injection with 0.7 mL PGF2α5 (G1), 6 (G2) or 7 (G3) days later. A second dose of 1 mL...
متن کاملThe Phase Space of the Wess-Zumino-Witten Model
We prove that the covariant and Hamiltonian phase spaces of the Wess-Zumino-Witten model on the cylinder are diffeomorphic and we derive the Poisson brackets of the theory. September 1992 1 Address from Oct. 1st 1992: Dept. Mathematics, Kings College, London WC2R 2LS. 2 Address from Oct. 1st 1992: Dept. Physics, University of Melbourne, Victoria 3052 Australia. 3 Talk given at NATO Workshop ‘Lo...
متن کاملImpact of a long-term tobacco-free policy at a comprehensive cancer center: a series of cross-sectional surveys
BACKGROUND Spain has passed two smoke-free laws in the last years. In 2005, the law banned smoking in indoor places, and in 2010 the ban was extended to outdoor areas of certain premises such as hospitals. This study assesses the impact of smoking consumption among hospital workers at a comprehensive cancer center after the passage of two national smoke-free laws. METHODS Six cross-sectional ...
متن کاملIntegrating heterogeneous web service styles with flex- ible semantic web services groundings
Integrating heterogeneous web service styles with flexible semantic web services groundings Conference Item How to cite: Lambert, David; Benn, Neil and Domingue, John (2010). Integrating heterogeneous web service styles with flexible semantic web services groundings. In: 1st International Future Enterprise Systems Workshop (FES2010) at The 3rd Future Internet Symposium (FIS2010), 20-22 Sept 201...
متن کاملUnstable Systems in Relativistic Quantum Field Theory
We show how the state of an unstable particle can be defined in terms of stable asymptotic states. This general definition is used to discuss and to solve some old problems connected with the short-time and large-time behaviour of the non-decay amplitude. CERN-TH/97-246 September 1997 ⋆ Address from Sept 1st, 1997 to August 31st, 1998.
متن کاملTraveling waves and conservation laws for complex mKdV-type equations
Travelling waves and conservation laws are studied for a wide class of U(1)invariant complex mKdV equations containing the two known integrable generalizations of the ordinary (real) mKdV equation. The main results on travelling waves include deriving new complex solitary waves and kinks that generalize the well-known mKdV sech and tanh solutions. The main results on conservation laws consist o...
متن کامل