Species and Speciation

One means of reconciling the apparent conflict between gene-based phylogenies and organismal phylogenies – that is, when what appear to be polyphylies stem instead from horizontal gene transfer – is to blame the concepts of species and speciation. It is not, in other words, that the organisms, or the phylogenies, are incorrect, but instead that our insistence on classifying organisms as species can represent an overly narrow world view. Towards better understanding the problem with species concepts, and therefore with microbial phylogenies based on the species model, it is helpful to take a few moments to consider just what species are and/or how species can be defined.

The basis of the species concept, as originally envisaged, is what is known as the biological species concept. In the biological species concept, species are seen as reproductively isolated but nonetheless inter-mating populations. That is, sex is rampant while mating is selective. The biological species concept is powerful not just because it provides a fairly unambiguous view of what a species is, i.e., a reproductively isolated, sexually reproducing population, but it also provides a fairly robust view of how speciation itself can occur: First comes reproductive isolation and only then can significant phenotypic divergence occur.

The problem with the biological species concept, for microorganisms, is twofold. The first problem stems from the relative lack of sexuality among microorganisms, especially bacteria, i.e., many populations of microorganisms consist of individuals that do not regularly mate and therefore that can be described as clones. Without the sex that binds most non-microbe populations together, even the concept of population must be altered, since no longer can a population be defined especially in terms of mating and reproductive isolation. The other problem with the biological species concept, for microorganisms, stems from their relative promiscuity. That is, with bacteria, for example, sex is not rampant, but at the same time "mating" is not especially selective. This is truly the opposite of the assumptions underlying the biological species concept, and it is no wonder therefore that its applicability to bacteria is limited.

Notwithstanding this concern, bacterial "species" do display a genetic cohesiveness. This cohesiveness is not as great as that seen, for example, in animal species, and presumably the difference can be explained in terms of the above-noted differences in quantity and "quality" of sex. The question then becomes, given the only partial similarity between microbe and non-microbe species concepts, what can one say about what is and what is not a microbe species or, especially, what is and what is not a bacterial species, and why do such species exist at all? One way of dealing with this uncertainty is to focus less on what is and is not a species but to concentrate instead on how genomes evolve from the perspective of their genes.

Orthologs and Paralogs

Though it is easy to blame horizontal gene transfer for complicating phylogenetic analyses, in fact there is an alternative means by which gene-based phylogenies can differ from organismal phylogenies, one that does not involve genetic migration between lineages. This other concern can be viewed as follows. Consider three species, A, B, and C. Let's say that species B and C cluster based upon ssu rRNA sequence (let's use the notation, A|BC, to indicate this closer clustering of B and C relative to the more distantly related A). Notwithstanding this clustering, let's assume that species B contains a version of a gene, an allele, that is more closely related to one found in species A than the equivalent found in species C (AA|BACBC where subscripts refer to the specific allele in question). The horizontal gene transfer explanation is that species B obtained the allele in question from species A, but that species B otherwise is more closely related to species C. That process is called orthologous replacement, where an orthologous gene is found as a common locus in two species plus is found in their most-recent common ancestor. That is, AA|BBCCBC is the pre-horizontal gene transfer phylogeny where the "BC" subscripts represent orthologous genes found in B and C's common ancestor, while AA|BACBC is the post-horizontal gene transfer phylogeny where the subscript "A" describes an allele instead found in the A lineage:


It is possible alternatively that a gene duplication event within a single genome, one predating the divergence of all three species, resulted in paralogs, that is, two very similar genes that otherwise are in the process of evolutionary divergence from each other (two genes are considered to be paralogous if they occupy distinct loci but nonetheless once occupied the same locus, i.e., were in fact the same gene in an ancestor). Now assume the extinction of one of the two paralogs following the divergence of all three lineages, where the paralog retained happens to be the same in species A and B but different from the one retained by species C. That is:


The problem with the latter explanation, of course, is that it requires three separate events, i.e., loss of loci three times (in the three different lineages, as indicated with the three separated arrows, above). By contrast, the horizontal gene transfer explanation requires only a single event. How to tell the difference? Perhaps in terms of gene location with genomes, though more importantly, these scenarios point to the difficulties of teasing out evolutionary relationships based solely upon the sequences seen in extant organisms, plus it illustrates how not only is horizontal gene transfer greatly complicating in phylogenetic reconstruction, it also can be difficult to prove. Finally, note that species C may very well differ from species A and B based upon the allele in species C experiencing rapid diversification.


That is, even without either hypothetical gene duplication events or horizontal gene transfer, gene-based phylogenies can differ from organismal phylogenies, with polyphyletic taxa envisaged (A with B but excluding C) that are not even a consequence of convergent evolution!

Lerat et al. (2005) describes two genes that are related by horizontal gene transfer, including by orthologous replacement, as xenologs (AA|BACBC). Alternatively, synolog is employed to describe genes that appear to be similar by decent where that similarity may or may not be a consequence of horizontal gene transfer (AA|BA?CBC). Thus, while genes that are similar as a consequence of orthologous replacement would be xenologous (AA|BACBC) but given uncertainty about that HGT status then two genes would have to be described as synologous (AA|BA?CBC).

For those with an interest, here is additional discussion of terms, borrowed (and quoted) heavily from Fitch (2000) : From p. 227: "Homology is the relationship of two characters that have descended, usually with divergence, from a common ancestral character… Characters can be any genic, structural or behavioral feature of an organism. Analogy is distinguished from homology in that its characters, although similar, have descended convergently from unrelated ancestral characters. The cenancestor is the most recent common ancestor of the taxa being considered." From p. 228: "It is worth repeating here that homology, like pregnancy, is indivisible. You either are homologous (pregnant) or you are not. Thus, if what one means to assert is that 80% of the character states are identical one should speak of 80% identity, and not 80% homology." From, p. 229: "There is a tendency to assume that if two characters are significantly similar they must be homologous. This assumption has been proven to be untrue many times when the characters were morphological or behavioral. For nucleotide and amino acid sequences, the situation is different. Most of the time, the degree of similarity is so great that one (including me) will say that convergence could not have caused this much similarity." From p. 230: "We must recognize that not all parts of a gene have the same history and thus, in such cases, that the gene is not the unit to which the terms orthology, paralogy, etcetera apply." From p. 228: "There are three disjoint subtypes of homology. Orthology is that relationship where sequence divergence follows speciation, that is, where the common ancestor of the two genes lies in the cenancestor of the taxa from which the two sequences were obtained. This gives rise to a set of sequences whose true phylogeny is exactly the same as the true phylogeny of the organisms from which the sequences were obtained. Only orthologous sequences have this property." From p. 228: "Paralogy is defined as that condition where sequence divergence follows gene duplication. Such genes might descend and diverge while existing side by side in the same lineage. Mixing paralogous with orthologous sequences can lead to a tree that has the correct phylogeny for the sequences but not for the taxa from which they derive; a gene tree is not necessarily a species [here 'organismal'] tree." From p. 228: "Xenology is defined as that condition (horizontal transfer) where the history of the gene involves an interspecies transfer of genetic material. It does not include transfer between organelles and the nucleus. It is the only form of homology in which the history has an episode where the descent is not from parent to offspring but, rather, from one organism to another. Unrecognized xenology has the greatest negative impact causing bizarre taxon phylogenies; however, it is that very bizarreness that alerts us to recent xenology."

Fitch then discusses how chloroplast genes that have migrated to the nucleus are "orthologous within plants" meaning that at the point where plants came into existence these genes were already present. Not discussed is whether these genes are paralogous if they are found in nuclear genomes in one plant and chloroplast genomes in another. He also notes that (p. 228) "The acquisition of chloroplasts by a eukaryote was a xenologous (in this case, symbiotic) event…" If one species contains a paralogous pair of genes, but the duplication event occurred following divergence from a second species, then both paralogs are orthologous to the same gene found in that other species. Furthermore, even a deleted paralog is still a paralog, just whose allelic form is a gap in sequence. An additional and important point is found on p. 230: "We must recognize that not all parts of a gene have the same history and thus, in such cases, that the gene is not the unit to which the terms orthology, paralogy, etcetera apply." Fitch also provides a distinction between convergent evolution (bestowing analogies) and parallel evolution (which does not produce analogies): Two identical sequences that undergo identical nucleotide replacements are undergoing parallel evolution whereas two non-identical sequences that become identical through mutational events are displaying convergent evolution. Above all, it is always a good idea to cite one's definition of a potentially contentious term, and also to provide one's definition of the term – which ideally is equivalent to that found in the cited reference! – so as to properly guide the reader as to one's intended meaning.

Mosaic Evolution

One can view… gene exchange… as effectively blurring the lines between microbial taxa, making it difficult to delineate microbial 'species' or groupings at higher taxonomic levels. It is difficult to apply the biological species concept… to groups of strains that are reproductively isolated at some loci and not others. Similarly, the variable domains of exchange among taxa…, as well as the variable rates of exchange among different genes, makes higher ordered taxonomic classification difficult… [I]f higher ordered taxonomy is dictated by both the presence of ancestral genes (as is the case in eukaryotes) as well as biased HGT within taxonomic groups, then bacterial taxonomy reflects both history (the patterns of speciation events) as well as ongoing processes (HGT). Hence, the conclusions of Zuckerkandl and Pauling (1965) , that genes are documents of evolutionary history, becomes far more complex as we integrate patterns of gene exchange—and lineage-specific gene loss—with histories of vertical inheritance. — Jeffrey G. Lawrence & Heather Hendrickson (2003)

After Zuckerkandl and Pauling (1965) , biologists came to think that the universal tree could be reduced to a tree based on sequences of orthologous genes, any of which (practical considerations aside) could serve as a marker for an entire genome, organism, or species. If, however, different genes give different trees, and there is no fair way to suppress this disagreement, then a species (or phylum) can "belong" to many genera (or kingdoms) at the same time: There really can be no universal phylogenetic tree of organisms based on such a reduction to genes… [Even small subunit rRNA phylogenies may be suspect since, for example,] the SSU rRNA of E. coli can be completely replaced by that of Proteus vulgaris (and the ribosomal protein L11 binding domain of E. coli 23S can be replaced by the homologous region of yeast 28S) without reducing growth rate by more than 10 to 30%. — W. Ford Doolittle (1999)

Mosaic evolution explicitly is the idea that different regions of genomes can and do have different evolutionary histories. Mosaic evolution is observable via an approach described as comparative genomics. In comparative genomics, one sequences the genomes from multiple organisms (i.e., different species or different isolates) and then aligns nucleotide sequences of the respective genomes. The result is that some regions will more closely align between one set of two (or more) species whereas a different region will more closely align between a different set of two (or more) species. It is common in these comparisons to state that different regions "come" from different organisms. Note though that it is a bit of a stretch to make claims as to the direction of movement from one specific organism to another since it is much more difficult to ascertain ultimate origin than it is to identify genetic homologies, and typically only a relatively limited number of genomic sequences are available, constraining knowledge even of extant genomes.

In one sense, mosaic evolution is just another way of saying that gene phylogenies are not organismal phylogenies. It is possible for mosaic evolution to be so extensive, however, that one no longer can speak of organismal phylogenies at all but instead only, at best, of gene-group (i.e., gene-cluster) phylogenies. Such "triumph" of mosaicism over cladism can be seen especially in the tailed bacteriophages. These organisms are small enough that extensive genome sequencing has been possible since the mid-1990s where full-genome sequencing has even come to be considered to represent a minimal requirement for isolate characterization. As a consequence, hundreds of alignable full-genome sequences are available. Nonetheless, there is no way to produce an organismal phylogeny with tailed phages, though such phages still can be differentiated into basic types (Rohwer and Edwards, 2002) . How these types may be further differentiated into species, indeed just what a phage "species" might represent, however, are completely unresolved questions (Abedon, 2009c) .

Phages and viruses in general – unlike bacteria and other cellular organisms – lack a core set of genes. Most obviously, for bacteria, these are genes such as ssu rRNA, which are lacking in viruses, in this case, simply because viruses don't encode ribosomes. While phages display mosaic evolution to the point where it overwhelms either species concepts or organismal phylogenies, bacterial phylogenies exist at a more intermediate state. This intermediate state is found between that of phages, at one end of the spectrum of genomic conservatism (the highly non-conservative end), and those associated with obligately sexual organisms such as most animals, which are found at the other of this spectrum. With bacteria, extensive genomic mosaicism thus exists, along with a great deal of highly promiscuous horizontal gene transfer, but at the same time bacteria still seem to possess organismal phylogenies.

See Levin and Bergstrom (2000) for discussion of bacterial mosaicism especially from the perspective of accessory genetic elements. For phages, see Abedon (2009c) for references to a number of reviews considering the subject of mosaicism, plus for a glimpse at the ecological context within which phage mosaic evolution occurs.