From basepairs to birdsongs: phylogenetic data in the age of genomics
نویسندگان
چکیده
Given the quantity of molecular data now available, including complete genomes for some organisms, one can ask whether there is a need for any data beyond complete genomic sequences for phylogenetic analysis. One reason to look beyond the genome is that not all character information is encoded in organismal genomes. We propose a hierarchy of characters that ranges from biologically transmitted but nongenomically encoded characters, such as bird songs, to characters that are genomically encoded. All of these characters can retain historical information and are potentially useful for phylogenetic analysis. In addition, a number of phenotypic levels that are expressions of the genome can be identified. The question whether it is worth including any of these levels if all of the underlying sequence data have been collected arises, since issues of redundancy occur. Utilization of phenotypic levels that are ultimately based on sequences may facilitate reconstructing homologies that are not evident from sequence data alone. We propose the use of simultaneous analysis of sequence data and as many levels of phenotypic characters as possible to take advantage of homology information that may be more easily recovered from the latter. A method that eliminates redundancy to the degree that it can be detected is proposed. 2003 The Willi Hennig Society. Published by Elsevier Inc. All rights reserved. Genomics has been described as revolutionizing microbiology (Nierman et al., 2000), transforming biomedical research (Miller, 2000), and causing ‘‘an intellectual and experimental sea change’’ in biology as a whole (Vukmirovic and Tilghman, 2000). The first genome of a free-living organism was completed for Haemophilus influenzae in 1995 (Fleischmann et al., 1995). Since then, roughly one microbial genome has been completed every 2 months and the pace is expected to accelerate (Fraser et al., 2000; Nierman et al., 2000). About one vertebrate genome per year is expected to be completed (Miller, 2000).As ofMarch 2003on theEMBL website, complete genomes were available for 873 viruses, 308 organelles, 112 phages, 97 Bacteria, 16 Archaea, and * Corresponding author. Fax: +614-292-3009. E-mail address: [email protected] (J.V. Freudenstein). 1 Present address: Department of Biology, Colorado State University, Fort Collins, CO 80523, USA. 2 The order of authorship is alphabetical; all authors contributed equally. 0748-3007/$ see front matter 2003 The Willi Hennig Society. Published doi:10.1016/S0748-3007(03)00067-7 14 Eukaryota (lists of completed genomes are available through GenBank (http://www.ncbi.nlm.nih.gov/) and EMBL (www.ebi.ac.uk/genomes/)). Genomics is also revolutionizing phylogenetics (Brown, 1996). Phylogenetic analyses of entire genomes are commonly conducted for viruses (e.g., Lindstrom et al., 1998; Bollyky and Holmes, 1999; Vrati et al., 1999; Smith et al., 2000) and have been performed using the mitochondrial genome in chordates (Naylor and Brown, 1998), mammals (Allard et al., 1999), and birds and related vertebrates (Mindell et al., 1999). The availability of entire genomes does not in itself ensure satisfactory reconstruction of phylogenetic relationships. Establishing orthology among genes is often difficult for distantly related taxa. Sequence similarity, which generally is used alone (e.g.,Makarova et al., 1999; Nelson et al., 1999), is insufficient to establish orthology and will often lead to misleading results (Thornton and DeSalle, 2000). Because only orthologous genes can be analyzed to infer phylogenetic relationships, only a minority of the genome may be suitable for phylogenetic analysis. For example, Huynen and Bork s by Elsevier Inc. All rights reserved. Fig. 1. Hierarchic depiction of organismal attributes that may be considered for phylogenetic analysis. 334 J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 (1998) analysis of the first 9 sequenced Archaea and Bacteria was limited to 34 genes. Nelson et al. s (1999) analysis of 17 microbial genomes was restricted to 33 putative orthologs. Govan et al. s (2000) analysis of positive-stranded RNA viruses was limited to the conserved RdRp domain of 118 amino acids. The problem of orthology is particularly acute for viruses, the genomes of which are mostly shorter than 10 kb (e.g., Bollyky and Holmes s (1999) phylogenetic analysis of mammalian hepadnaviruses was based on genomes of about 3.2 kb). This is shorter than the 12,234 protein-coding nucleotide sites of the mitochondrial genome in chordates, which were found to support an ‘‘incorrect’’ tree, regardless of the character coding (nucleotide, base, or amino acid) or tree construction method (parsimony, distance, or maximum-likelihood) employed (Naylor and Brown, 1998). Analyzing the entire viral genome, Bollyky and Holmes (1999) recovered phylogenetic trees whose topologies were highly dependent on the maximum-likelihood model used. Likewise, using the entire mitochondrial genome of birds and relatives, Mindell et al. (1999) found that their ingroup was ambiguously rooted because the resolution depended on the maximum-likelihood model used. If whole genomic sequences may be insufficient for phylogenetic analyses, then where are other characters to come from? If the information from morphology and other phenotypic characters has already been captured when whole genomes are analyzed, recommendations to combine data would seem to be at a loss when there is apparently nothing remaining to combine with the sequence data. We argue that there are two sources of characters, additional to the genome, that may be useful. The first source is the phenotypic characters just mentioned. Phenotype in its traditional sense can be construed quite broadly—Johannsen s (1909) definition as the expression of a genotype in interaction with its environment is sometimes broadened even further to mean all of the characteristics of an individual. We focus here on the determining factor for character states—if the character state as expressed in the individual is determined by (i.e., can be completely predicted by examination of) the genome, then we consider that expression to be phenotypic. However, if the particular expression of a state is determined by some other agent (such as the environment), even though the basis for the character is in the genome, we would consider this to be nonphenotypic. Examples of phenotypic characters would be amino acids and morphological structures (to the extent that their final state is determined by the genome). We argue that such characters have a useful role to play in improving the reconstruction of character state transformations when analyzed in conjunction with genomic characters, as described below. The second source of characters are those features of organisms not encoded by the genomes. Examples of such features are centrosomes, prions, and behaviors, even though, at least for the latter two, the ultimate basis for the character resides in the genome. The key is that the particular state exhibited is not encoded in the genome. In what follows we describe a way of viewing the different types of characters potentially useful for phylogenetic analysis and present a method that maximizes use of data. Hierarchy of information If we consider the range of possible phylogenetic indicators—those features that can store information relevant to reconstructing the history of a clade—the list is long and includes some elements that are not commonly used for this purpose. We conceive of these features as forming a hierarchic, nested set defined by a series of properties (depicted as a Venn diagram; Fig. 1). This hierarchy is neither perfect nor the only way to organize this information, but it is a useful structure in which to focus on the features that are important in the selection of phylogenetic markers. For each level we describe the defining property of that level and then give some examples that fall within that level but outside of the next lower level. The most inclusive level comprises all features of a species and its environment. The key question is which of these features retain historical information about the species relationships. Some features are intrinsic to the organisms while others are not. A specific example of the latter would be the geographic location in which a species occurs. To the extent that a species and its descendants remain in the same place, location becomes an historical attribute of a clade. However, this attribute is purely circumstantial, being nonintrinsic and inherently nonassessable from examination of specimens themJ.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 335 selves. Such biogeographic information is often mapped onto phylogenetic diagrams a posteriori, but rarely has been used in their construction (but see Dressler, 1990, p. 122). A subset of all features are those that are intrinsic properties of the organism itself in some way. These include both biologically and nonbiologically determined features. An example of the latter would be variation due to phenotypic plasticity. Such features have a genetic basis, in that different phenotypes are possible depending upon the environment in which the organisms occur, corresponding to the classical concept of ‘‘norm of reaction.’’ A common example is the relationship of plant height to altitude, such as was shown for various species in the classic experiments of Clausen et al. (1940). To the extent that a species persists under particular environmental conditions, a specific phenotype will persist and retain historical information. However, if the environment in which a descendant species finds itself has changed relative to that of an ancestor and plasticity has been retained, the features of the descendant species could revert to an alternate phenotype. Hence, the specific phenotype need not be transmitted to descendants. This includes, for example, the feature of cell adhesion discussed in a hypothetical example by Newman and M€ uller (2000). The remaining levels are sharply distinguished from the previous levels in that the agent of information transmission is biological. This is a crucial distinction because biologically transmitted attributes are more strongly associated with taxa than those that are nonbiologically transmitted or merely circumstantial. Hence, it is at this level that we speak of characters, which we define simply as biologically transmitted attributes of a species. At the next level are systems in which only one species is involved in the transmission of the information that forms the character. Such characters include behaviors that are learned in each generation from preceding generations. The next level delimits those features that are transmitted via cellular processes. These include structural features such as centrioles and prions, whose form is not determined by the genome, and features encoded by the genome. Although prions are ultimately encoded by the genome, their alternate conformations, which we would code as states, are not genetically determined (see below). The final level includes those features that are encoded by any of the genomes of the organism. These include nuclear, mitochondrial, and plastid sequences and any phenotypic characters that are derived from those sequences, including morphological characters. This least inclusive class includes the great majority of characters that are commonly employed in current phylogenetic analyses. In what follows we focus in more detail on some key levels. Biological transmission Phylogenetic study is oriented toward biological transmission of information among taxa and specifically to information encoded in the genome. The evolutionary biology community has a long-standing aversion to Lamarckian views on ‘‘inheritance of acquired characteristics,’’ because of their failure to provide a mechanism for inheritance. Accordingly, objection should disappear when a mechanism is evident. Mutations, after all, are simply acquired changes of the genome. DNA replication is the mechanism of genetic inheritance, but there are many other features of an organism that are not encoded in the genome, yet are replicated. Even Richard Dawkins (1976, pp. 191–192) has argued that replicators need not be genetic: ‘‘I am an enthusiastic Darwinian, but I think Darwinism is too big a theory to be confined to the narrow context of the gene. The gene will enter my thesis as an analogy, nothing more. What, after all, is so special about genes?’’ Given the possibility of extragenomic replication, are there nongenomically encoded features that can still be inherited, show variation, and therefore serve as suitable characters for phylogenetic analysis? If there are such features, they would be missed in an analysis based solely on the genome. We argue that these characters exist and present examples below. Many examples of biologically transmitted attributes that are not encoded in the genome can be described. One broad category would contain tightly linked symbioses, which themselves can give rise to useful phylogenetic characters. Well-known examples of symbioses include lichens composed of fungi and algae, termites harboring wood-digesting flagellates in their guts (Cleveland, 1926), ants that feed exclusively from fungus gardens that they grow (Chapela et al., 1994), and parasitic wasps that rely on polydnaviruses to overwhelm the host immune system (Whitfield, 2000). Attributes of one of the partners can be used to inform us about the phylogenetic history of the other, just as with parasites. Indeed, features of parasites are sometimes used as characters of their hosts in systematic studies (e.g., Eichler, 1941; Brooks, 1981). If the symbiotic association is very close, it may result in a dependence in which one or both partners cannot live without the other. For example, some leaf-cutter ants have apparently lost ordinary digestive enzymes because of their reliance upon fungi to process plant matter (Martin, 1987). In extreme cases, the association may produce novel features that are something more than the sum of the parts. For example, lichen relationships are known to result in the production of novel chemical substances not present when the members of 336 J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 the association are free-living (see review in Brodo et al., 2001, pp. 42–43). Protracted dependency and synergism may result in such well-known relationships as eukaryotic cells containing mitochondria and plastids. This demonstrates that symbiosis can lead eventually to our most fundamental character level, genomic transmission (Fig. 1). Interesting intermediates may exist for which placement in our proposed hierarchy is not immediately clear. An example is the bacterium Wolbachia, which is transmitted in the cells of other taxa while remaining an independent organism (Yen and Barr, 1971). In some cases, details about characteristics of location, such as ‘‘lives at deep sea vents’’ for a kind of crustacean, serve as plausible surrogates for heritable characters of physiology, whereas alternative descriptions are less clearly related to informative biological aspects (Miller and Wenzel, 1995). Perhaps ‘‘lives at the mid-Atlantic ridge’’ is also correct, but is meaningful geographically rather than biologically. The distinction between these cases may be vague, and perhaps a good rule of thumb is whether the organism would be expected to survive if it were kept in a similar but alternative environment. If the crustacean would not be expected to survive in the Atlantic benthos away from the vent, then ‘‘lives at deep sea vents’’ seems to be a useful biological characteristic. By contrast, kangaroos do not cease to be kangaroos in any meaningful way when they are removed from Australia. This can be considered a question of sine qua non; if the property in question is essential and indispensable to the individuals of a species, then it surely meets the criterion that we use for phylogenetic characters. The criterion can apply to other characters, such as those of symbiosis: If depriving a termite of its symbionts causes the termite to cease vital function, then the symbionts can be considered characteristics of the termite. Not all individuals of a species will necessarily bear parasites at all times that are characteristic of the species—thus, the criterion of indispensability is sufficient, but not necessary. Intraspecific transmission The boundary between intrinsic and extrinsic influences can be indistinct when learning plays a role. A continuum seems to exist from situations where genetic determinism is a reasonable assumption to those where transmission of information from one generation to the next is clearly extragenetic. For example, animals may have a genetic predilection for performing a kind of behavior and a genetic template or rule-of-thumb for evaluating correct form, but nonetheless the behavior itself is shaped through repeated trials and progressive learning of the skills necessary to complete the desired product. The first and last steps are predetermined and the animal simply fills in the intermediate links, each individual learning to do so independently. An example of this might be nest building in weaver birds who learn to tie the knots that are necessary to make their stereotypical nests (Collias and Collias, 1984). The first step toward extragenetic transmission comes when birds that observe other weavers learn to build their own nests more rapidly, perhaps in 4 months rather than 5 (Collias and Collias, 1984, p. 222). The ability to learn from other individuals permits a cultural lineage to provide historical data independent of genetic data. Perhaps the best example of persistent culture outside of humans is found in bird song. In some sparrows, songs are learned and consolidated by young individuals, and then the song remains unchanged throughout the birds adult life and experience (Marler and Tamura, 1964; Marler and Peters, 1981). Call learning may permit incorporation of elements not just from other individuals, but even from other species (Gaunt et al., 1994). Mundinger (1979) specifically discussed eliminating learned elements to find phylogenetically informative elements, although he also concluded that learning itself is a useful character in finches. More critically, studies of various groups have found repeatedly that persistent local dialects are not related to genetic structure of the populations and therefore that the distinction between dialects themselves is not likely to be genetic. This has been shown in sparrows (Lougheed and Handford, 1992), cowbirds (Fleischer and Rothstein, 1988), and parrots (Wright and Wilkinson, 2001) and is inferred in hummingbirds (S.L. Gaunt, pers. comm., 2001). These authors and others (e.g., Zink and Barrowclough, 1984; Zink, 1985) comment specifically that the call dialects are conserved in the face of high gene flow across dialect boundaries. In other species, genetic distinctions are found at some but not all dialect boundaries (Kroodsma et al., 1985; Balaban, 1988), one possible explanation being that the dialects form first, and genetic distinctions between populations sometimes follow. Indeed, Wright and Wilkinson (2001) compared closely related species and deduced that ‘‘propensity to form [temporally stable] dialects can be inferred to have been present in the common ancestor of this clade.’’ Bird song presents us with an example where there is vertical transmission of information extrinsic to the genome, where variation in this information can be more stable than the genotypes of the individuals themselves, and where subsequent genetic variation might be structured according to the extrinsic, extragenetic attributes. Thus, it is possible that some variation in external attributes precedes genetic variation, the exact opposite of what is generally assumed. The examples used here span five families in three orders [Emberizidae, Fringillidae, and Icteridae, (Passiformes); Trochilidae (Apodiformes); Psittacidae (Psittaciformes)], so this phenomenon is not restricted to certain peculiar lineages. Whether this type of vertical transmission in learning is found in other systems will not be known until other systems are J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 337 as thoroughly explored as bird songs are, but we can expect to find analogies. For example, host choice may be one, and it is already known from cross-fostering experiments that certain parasitic wasps favor the hosts that their mothers chose (Turlings et al., 1993). The same may apply to nest site selection in wasps (Wenzel, 1996). Perhaps animal migration will be similar, as suggested by the situation known widely among laymen of a researcher flying an ultra-light airplane to direct eager flocks of na€ıve birds. The animals inherit an instinct to migrate but the details of migration, which may remain stable over evolutionary time, are nonetheless transmitted biologically and extragenetically. We realize that some may find the use of these characters controversial. One common claim is that the ‘‘character tree’’ resulting from certain kinds of character data will not be the same as the ‘‘taxon tree’’ (see review in Doyle, 1991). We assert that all phylogenetic hypotheses are character phylogenies and that the validity of any character or suite of characters is best evaluated in simultaneous analysis with other characters (Kluge, 1989; Doyle, 1991; Nixon and Carpenter, 1996). Cellular transmission Centrosomes, and the mechanism by which they are inherited, have been called ‘‘a central enigma of cell biology’’ (Wheatley, 1982). This enigma stems from the centrosome s function in the cell and its apparent autoreplication. Centrosomes are organelles composed of two paired centrioles surrounded by the dense, amorphous pericentriolar material. The centrosome plays an important role in maintaining the structure of the cell by generating the kinetochore fibers of the mitotic spindle apparatus (Nicklas, 1971; Wheatley, 1982), which are responsible for chromosome movement during metaphase (Nicklas and Koch, 1972) and cell cleavage (Rappaport, 1986). Centrosomes are also responsible for a number of phylogenetically useful traits, including the creation of retinal rods, the motility of epithelial cells, and proper antigen reorganization in lymphocytes (Brown et al., 1992). For a cell to form a cilium, it must have at least one centrosome; protistologists have long used the presence of cilia as a diagnostic character in their phylogenetic studies (Corliss, 1979). In biparental, diploid animal species, centrosomes are disassembled in both male and female gametes and differentially reassemble at the time of fertilization: sperm-derived centrioles are assembled with egg-derived pericentriolar material to form the daughter centrosome (Schatten, 1994; Callaini et al., 1999; Palazzo et al., 2000). In haplodiploid species, centrosome inheritance is maternal when males are formed and paternal when females are formed (Tram and Sullivan, 2000). The mechanisms driving these inheritance phenomena are unknown. Although at least one centrosome protein, centrosomin, has been shown to derive from nuclear DNA, when this protein is mutated in Drosophila, the centrosome organizes normally, but the individual s sperm are without flagella and so are rendered ineffective (Li et al., 1998). It is not clear, however, whether all centrosome proteins are produced by the cell. Indeed, as Sluder (1992, p. 254) stated, strong evidence suggests that this is not the case: ‘‘Neither transcription, translation, nor nuclear DNA synthesis are required for the repeated reproduction of sperm centrosome.’’ The inherited protein components appear to be used in the replication of a new centriole, and when the necessary components are present in vitro, protein microtubules are formed (Weisenberg, 1972), although centrioles themselves are not, implying that another centriole need be present to serve as an information-bearing template. At a minimum, critical inquiry has failed to demonstrate the redundancy of phenotypes and genotype in the case of centrosomes, though not for want of trying. Therefore, inclusion of these extragenetic characters in a simultaneous analysis with characters that are genomically transmitted is warranted. Another source of characters that are not genomically transmitted is prions. The term prion originally described a particular kind of cellular protein that is the etiologic agent of the transmissible spongiform encephalopathies (Prusiner, 1994; Weissmann, 1994). However, most identified prions are not disease causing, but are functional (for a review of these, see Cox, 1965; Lacroute, 1971; Wickner, 1994; Tuite and Lindquist, 1996). These proteins are curious for at least five reasons: they can alter their own conformation in many ways, resulting in multiple types that evoke new phenotypes in the organism (variation; Parchi et al., 1996; Collinge et al., 1996); the resulting types (or novel character states) can confer special properties on the organism that alter (enhance) its survivability (fitness; Magasanik, 1992; Hofstetter et al., 1974; Lindquist et al., 1995); they can alter surrounding proteins, converting them to prions (within-organism transformation; Prusiner, 1991; Weissmann, 1994); they are transmissible between organisms (heritability; Cox, 1965; Lacroute, 1971; Tuite and Lindquist, 1996), and once formed, they change without a change in nucleic acid sequence (nongenetic inheritance; Lindquist et al., 2001; Serio and Lindquist, 2000; Tuite, 2000). Additionally, the alternate conformational states of prions (Parchi et al., 1996; Collinge et al., 1996; Lindquist et al., 2001) are maintained across generations in yeast (Sondheimer et al., 2001), showing no evidence of reversion to the wild-type form. This fixation is an important property of any replicator if it is to reflect phylogeny. If prions change randomly from wild-type to mutant and back again rapidly within a population, we might not expect the generation of synapomorphy. 338 J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 Prions exhibit heritable variation, potentially generating synapomorphy and marking ancient divergences. However, the primary protein structure of prions does not encode their conformational structure; this means that not all of their heritable information is encoded in genomes and that this variation will need to be coded separately. Genomic transmission Genomic sequences have become well established as a source of phylogenetic information; the preponderance of systematic studies employing these data speaks to their importance. Phenotypic characters are encoded by the genome, but reflect underlying genomic changes to a greater or lesser degree. The most familiar phenotypic character type is morphology, which was the first source of systematic data for most taxa. Some authors have questioned the usefulness of morphological data for phylogenetic analysis as compared to molecular data (e.g., Sibley and Ahlquist, 1987; Gottlieb, 1988; Sytsma et al., 1991; Graur, 1993; Hedges and Sibley, 1994; Hedges and Maxson, 1996; Givnish and Sytsma, 1997a,b). The argument essentially reduces to one of ability to assess homology (the perception that molecular data have less homoplasy than morphological data), but homology hypotheses clearly can be problematic with both types of data, as witnessed by the problem of sequence alignment (Gatesy et al., 1993; Lutzoni et al., 2000). While Hillis (1987) suggested that at deep levels it might be more difficult to homologize morphological characters than molecular ones, Lanyon (1988) argued the opposite, indicating that at least in some cases, morphological characters may be more easily reconstructed as synapomorphies than more variable molecular states. Philippe and Adoutte (1998) suggested that molecular sequences might be insufficient to resolve eukaryote phylogeny and argued for careful selection of morphological and biochemical characters. Examples of analyses in which combined morphological and molecular datasets yield better-supported trees than either dataset alone are common (e.g., Freudenstein, 1999; Simmons et al., 2001), suggesting that morphological characters often reflect a pattern similar to that seen with molecules. Greater numbers of molecular characters can overwhelm morphological characters (Hillis, 1987), depending on relative numbers of informative characters, but such a threat is not as dangerous as it seems (Wenzel and Siddall, 1999). Even small numbers of morphological characters can contribute significantly to the results of a combined analysis (Barrett et al., 1991); as Donoghue and Sanderson (1992) pointed out, the sheer numbers of characters are not as important as character interaction and distribution of homoplasy. Gatesy and Arctander (2000) found that morphological characters provided over half of the partitioned branch support in their simultaneous analysis of five datasets. Goodman et al. (1987, p. 147) stated that, ‘‘A more basic problem with morphological characters as indicators of genealogical relationships is that there is no direct correspondence between the characters and heritable information encoded in genomic DNA.’’ Clearly, this statement is questionable when so broadly framed, as there are many simple characters (flower color, for example) whose genetic basis is known to be straightforward (see Gottlieb (1984) for examples in plants). It is particularly in more complex (1⁄4 polygenic) structures that the directly observable correspondence with underlying genetics may diminish (Roth, 1994). The issue of complexity has been considered in the systematic context of homology hypotheses (Riedl, 1978; McShea, 1991; Donoghue, 1992; Donoghue and Sanderson, 1994; Janies and DeSalle, 1999; Bang et al., 2000)—in particular the idea that more complex characters are likely to exhibit less homoplasy, which might argue for their use when compared to simple sequence characters. Most of these discussions of molecules and morphology predate the era in which sequencing of whole genomes was a possibility. The fact that most combined analyses comprise very small portions of the genome means that the independence of these characters is a safe assumption, such that the decision about whether to analyze both morphological and molecular data is one of efficacy rather than redundancy of information. Morphology represents one end of a range of phenotypic expressions of the genome, which also includes nucleotide class (purine vs pyrimidine), amino acid, amino acid class, and higher-level structures in proteins and RNA. Given that they are all encoded by the genome, is there any benefit in coding potentially redundant, nonindependent phenotypic characters in addition to genomic characters in a phylogenetic analysis? We argue that there is, precisely because homology can be expressed at multiple different levels, from nucleotides to genes, gene functions, gene networks, embryonic origins, and morphological structures (Dickinson, 1995; Abouheif, 1999). We further argue that such a coding scheme is in fact a type of total-evidence approach (Kluge, 1989), because it takes advantage of homology hypotheses (characters) at all levels. To the extent that characters are genomically determined, their state transformations will be marked by changes in the underlying gene sequence. This means that the genome should represent a record of all phenotypic changes exhibited by the organism and changes in the DNA sequence that may not cause any detectable change in the phenotype. Therefore, the question is really not whether the genome contains all of the information reflected in the phenotype, but whether it can be J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 339 recovered, which is fundamentally an issue of homology and transformation. Relative rates of change in the genome and phenotype are important here. Because not all changes that occur in the genome are reflected in the phenotype, the rate of change in the genome is expected to be higher, such that saturation of changes (i.e., multiple changes for characters along individual branches) may be exhibited in the genome relative to the phenotype. Saturation is most often discussed with reference to particular base positions, such as third codon base positions in coding sequences (e.g., Hillis, 1991), but the same effect can be observed with reference to any character if it changes often enough. The concern is that ‘‘noise’’ will result from characters that have changed so rapidly that it is difficult to reconstruct their transformations (though noise is relative; Wenzel and Siddall, 1999). While model-based phylogeny reconstruction methods use specific transformational assumptions to approach this problem, the ability to reconstruct the transformation of characters in a parsimony framework depends in large part on sufficient taxon sampling, because real specimens represent real character combinations that help to exclude some of the universe of possible character transformations between divergent taxa. If all intermediates were available and taxon sampling were complete, reconstruction of transformations would be more straightforward. When taxon sampling is sparse, transformation reconstruction becomes more difficult and is facilitated by the addition of characters that change more slowly and less ambiguously partition the taxa (e.g., Davis et al., 1998). Slowly changing or ‘‘conservative’’ characters are often prized for their quality as phylogenetic markers (e.g., Lloyd and Calder, 1991; Lanyon, 1988). Felsenstein s (1981) proposed weighting of characters based on their improbability is a codification of what has long been practiced intuitively. The absence of a lockstep correspondence between a complex phenotype and underlying genetics can be used to advantage in systematic studies, since phenotypic characters may retain evidence for homology when the underlying genotypic characters do not (de Beer, 1971; Meyer, 1999; Wray, 1999; but see Doyle, 1996, p. 59). As Meyer (1999, p. 144) noted, ‘‘nonhomologous genes, gene networks and developmental mechanisms can make structures that are typically considered to be homologues.’’ This may occur, for instance, in a biosynthetic pathway when one protein is substituted for another protein that is coded by a different gene.Kjer (1995) noted the more highly conserved nature of ribosomal RNA secondary structure relative to nucleotide sequence. This retention of homology is particularly important when distantly related taxa are sampled (due to extinction or undersampling). As one moves up the hierarchy of these levels (from nucleotide to morphological structure), one may generally expect the higher-level characters to evolve more slowly than the lower-level characters, because each dependent higher-level character and/or character state may include multiple dependent lower-level characters and/or character states, as noted above. Hence, synapomorphies obscured by multiple changes at the lower-level character(s) may be retained by the higher-level character(s). In such cases, phylogenetic signal may be more easily recoverable from the phenotypic than from the genotypic characters (Lanyon, 1988; Naylor and Brown, 1998). Phenotypic characters can serve the function of ‘‘guiding’’ the genotypic characters as transformations are reconstructed by providing reinforcement for key genomic synapomorphies. Hence, like increasing taxon sampling (Hendy and Penny, 1989; Hillis, 1996; Phillipe et al., 1996; Graybeal, 1998; Zwickl and Hillis, 2002), addition of phenotypic characters to analyses including genotypic characters on which they are based can help to clarify character state transformations. Although this approach may seem unusual to phylogeneticists, developmental biologists have recognized the need to bridge the gap between sequence and morphology and have begun to comment on the phylogenetic utility of simultaneous analysis of characters derived from different developmental levels (Janies and DeSalle, 1999; Bang et al., 2000). The level of genomic transmission may also include some epigenetic changes, although the importance of such changes in marking phylogenetic pattern remains unclear. Various meanings exist for the term ‘‘epigenetic’’ (Bird, 1998); Waddington (1939) defined the term to mean the complex interactions that comprise the development of a multicellular organism. Some of these interactions lead to changes in the genome that may be heritable, but that are not due to a change in base sequence (Jablonka and Lamb, 1998; Cubas et al., 1999; Wolffe and Matzke, 1999)—they are due instead to methylation or changes in chromatin structure (Wolffe and Matzke, 1999). Epigenetics is now often used to refer to the study of such heritable but nonsequencebased changes (Bird, 1998). When such changes are heritable and fixed in taxa, they are eligible to be used as characters. To the extent that they are not fixed or heritable, they may appear as polymorphisms, which are problematic for phylogenetic analysis (Nixon and Davis, 1991; Mabee and Humphries, 1993). Analysis of phenotype and genotype The example provided in Fig. 2 compares the information contributed by phenotypic and genotypic levels—namely amino acids and their underlying sequences. In this example, the third position of a codon varies among six taxa. Assuming that the states in Taxon 1 are plesiomorphic, cladistic analysis of just the nucleotide sequences (under Fitch parsimony) would Fig. 2. Use of genomic and phenotypic data. A simple data matrix with four sequences is translated into amino acids. (A–D) Individual trees resulting from cladistic analysis of the data. Trees A–D are obtained when only nucleotides are analyzed; trees B–D result from analysis of nulceotides plus the amino acid character. (E) Strict consensus of trees A–D. (F) Strict consensus of trees B–D. 340 J.V. Freudenstein et al. / Cladistics 19 (2003) 333–347 result in the trees shown in Figs. 2A–D and the strict consensus polytomy in Fig. 2E. However, if the amino acid character is added to the analysis, the trees in Figs. 2B–D and consensus tree in Fig. 2F result. The amino acid character has a clear synapomorphy, whereas the sequence data set does not. There is no incongruence among the trees, but addition of the amino acid character allows the exclusion of some possible trees from the results, thus clarifying the transformation. Agosti et al. (1996) introduced the simultaneous use of nucleotide and amino acid characters derived from the same sequence in phylogenetic analysis of proteincoding genes. By using nucleotide and amino acid characters, information may be incorporated from both methods of coding the sequence data. Agosti et al. (1996, p. 67) recognized three possible outcomes when nucleotide and amino acid characters are coded: First, each amino acid and the information in its associated triplet may be entirely congruent and equally informative with respect to each other. Second, one of the sources of information will show no informativeness while the other will. Finally, the two sources of information—a nucleic acid triplet and its amino acid—may be informative but incongruent. There is also a fourth possibility: both sources of information will be congruent, but show different char-
منابع مشابه
Artificial intelligence & genetics
Artificial intelligence (AI) is the development of computer systems that are able to perform tasks that normally require human intelligence. Artificial intelligence (AI) is a wide-ranging tool that enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision making—and already it is transforming every walk of life. AI has application...
متن کاملInvestigation of GDF9 and BMP15 Polymorphisms in Mehraban Sheep to Find the Missenses as Impact on Protein
Utilization of fecundity genes such as GDF9 and BMP15 can help improve reproductive traits in sheep breeding programme. To evaluate effects of missense mutations on protein function, the polymorphisms of GDF9 and BMP15 genes were screened in twelve mehraban sheep using DNA sequencing, followed by protein structure modeling. Six single nucleotide polymorphism (SNPs) known as FecG mutations (G1-G...
متن کاملPhylogenetic relationships of the commercial marine shrimp family Penaeidae from Persian Gulf
Phylogenetic relationships among all described species (total of 5 taxa) of the shrimp genus Penaeus, were examined with nucleotide sequence data from portions of mitochondrial gene and cytochrome oxidase subunit I (COI). There are twelve commercial shrimp in the Iranian coastal waters. The reconstruction of the evolution phylogeny of these species is crucial in revealing stock identity that ca...
متن کاملPhylogenetic relationships of the commercial marine shrimp family Penaeidae from Persian Gulf
Phylogenetic relationships among all described species (total of 5 taxa) of the shrimp genus Penaeus, were examined with nucleotide sequence data from portions of mitochondrial gene and cytochrome oxidase subunit I (COI). There are twelve commercial shrimp in the Iranian coastal waters. The reconstruction of the evolution phylogeny of these species is crucial in revealing stock identity that ca...
متن کاملAssessment of the Genetic Diversity of Almond (Prunus dulcis) Using Microsatellite Markers and Morphological Traits
The genetic diversity among 56 almond (Prunus dulcis) genotypes was analysed using 35 microsatellite markers and 14 morphological traits. Analysis of morphological traits revealed a wide range of variation among the studied genotypes. Out of 35 simple sequence repeats (SSRs) markers, 25 were polymorphic, producing 215 alleles that varied from 2 to 16 with an average of 8.76 alleles per locus. R...
متن کاملA Computational Environment for the Evolutionary Sound Synthesis of Birdsongs
Birdsongs are an integral part of many natural environments. They constitute an ecological network of sonic agents whose interaction is self-organized into an open complex system of similar cognitive characteristics, at the same time that it continuously generates original acoustic data. This work presents a preliminary study on the development of an evolutionary algorithm for the generation of...
متن کامل