Generally what is retained within genomes are functions that are both useful and not redundant. This can be seen both in terms of the gaining of new genes and the losing of old genes. The former typically must supply some new functionality to be retained while the latter may be lost when something else replaces that functionality, whether a new gene found within a genome or instead a loss of requirement for a function such as because a metabolite becomes consistently present within an organism's environment. The latter can be seen, for example, in the course of an organism's entering into a symbiosis with another species. As always, losses of both new and old functions, if not too essential, can occur also as a consequence of genetic drift.

Zone of Paralogy

The basic requirements of a functional module are two: first. that it carry out its biological function effectively; second. that it retain its interchangeability, both in terms of its ability to be placed into other genomes and in terms of its functional compatibility with a variety of different combinations of modules carrying out the other essential biological functions. — David Botstein (1980)

The question of just what genes are most likely to be retained following horizontal gene transfer events leads to a concept of a zone of paralogy (Lawrence and Hendrickson, 2003) . Such a zone is seen perhaps somewhat hypothetical in graphs of frequency of successful gene transfer versus phylogenetic distance between the donor and recipient organism. For organisms that are too closely related, it is difficult to recognize gene swaps, even if such swaps occur rampantly. This would be equivalent to the swapping of alleles by obligately sexual organisms: As the frequency of such swaps increases, greater conservation is required to sustain fitness in the face of gene exchange, and therefore differences among alleles become (or remain) more subtle and difficult to detect, particularly in terms of distinguishing whether they are products of vertical versus horizontal inheritance (variation that is due to mutation within a lineage versus mutation in a different lineage that is followed by migration into the lineage in question).

As phylogenetic distance increases, the potential to detect gene swaps also should increase due to increasing sequence differences. At the same time, however, there is a decline in both the likelihood of gene swaps occurring as well as the potential for such swaps to be not harmful. What is left are fortuitous swaps that are retained because they provide a benefit. The acquisition of a gene, one that replaces an already existing gene, presumably has a relatively low likelihood of being an improvement on the already existing gene, however. This occurs because the already existing gene presumably is well adapted to the organism within which it resides. Thus, increasing genetic (phylogenetic) distance results in decreased opportunities for transfer as well as decreased potential to serve as improvements on previously existing genes, and thus a decline in likelihood of such swaps is expected with increasing evolutionary distance between genetic donors and recipients. At the same time, however, those swaps that do survive are particularly noticeable, and phylogenies based on such swaps will tend to be gene-based rather than reflecting organismal phylogenies.

Swaps of one gene for another are described as orthologous replacement events. They are driven predominantly by homologous recombination. That is, two genes with similar functions found in similar locations in otherwise similar organisms likely will have sufficiently similar nucleotide sequences that swapping of one gene for the other can be likely following standard mechanisms of horizontal gene transfer (that is, upon generation of a pre-genetic recombination state). Thus, as genetic distances increase, not only are opportunities for transfer reduced (i.e., in terms of uptake of genes such as via transduction), and potential for improving upon existing functions likely also is reduced (due to both random and deterministic changes in the rest of an organism's genetic endowment), but so too is the potential for homologous recombination reduced (given divergence in nucleotide sequence). On the other hand, if homologous recombination is less likely, then the occurrence of gene acquisition becomes likely only given non-homologous forms of recombination, which in turn gives rise to insertions of new genetic material rather than a swapping of genes.

The zone of paralogy describes a conjecture that – in contrast to the declines in the likelihood of orthologous replacement with increasing phylogenetic distance – there should be an increase in the likelihood of acquisition of novel genes via insertion events. This increased likelihood of insertion is rather than gene swapping, as noted, because of a decreased potential for homologous recombination with greater phylogenetic distance. In addition, however, with increasing phylogenetic distance perhaps also comes an increased potential for a gene to be sufficiently divergent that it can supply a novel and potentially beneficial function to the recipient. On the other hand, as phylogenetic distances increase even further, both opportunities for transfer and, perhaps, the potential for incoming genes to provide a useful function decline. The zone of paralogy, as a consequence of these various tendencies, thus is a region, in terms of phylogenetic distance, where the likelihood of gene acquisition is still reasonably high: Reduced phylogenetic distance in shifts gene acquisitions to homologous recombination while greater phylogenetic distance reduces access and, potentially, also utility.

An additional and important consideration is that horizontal gene transfer simply is easier to recognize given greater phylogenetic distance between donor and recipient. Thus, the greatest degree of horizontal gene transfer likely goes on among very closely related individuals, perhaps even clones, but the genetic similarity among these individuals is so great that it is difficult to recognize from nucleotide sequences these as gene transfer events have even occurred. Instead, it is paralogous replacements that are detectable with high probability as horizontal gene transfer events.

One can generally view these ideas as distinctions between new things being a good fit (as in when purchasing or wearing clothing) and things instead being useful or fashionable. If something both fits and is useful, or is otherwise desirable, then that's great. If something fits but isn't useful, then you are unlikely to buy or wear it (orthologous replacement with otherwise neutral alleles). If something is useful, however, then even if it doesn't fit (let's say that it is too large for you) or is otherwise uncomfortable or inconvenient to use, it still might be tempting to own it, putting up with its shortcomings such as by mostly keeping it in the closet, and wearing the item only as needed (beneficial products of illegitimate recombination or, alternatively, as associated with plasmid acquisition). A party costume or sports clothing (ski boots!) might be good examples of the latter. Thus, changing your day-to-day clothing might be viewed as an equivalent to orthologous replacement, that is, genes acquired that are sufficiently similar to pre-existing genes that they do not fall within the zone of paralogy. Alternatively, putting on an outfit to go to a party, or to go skiing (over your everyday core, your underwear!), might be viewed instead as equivalent to the acquisition of new genetic material other than via homologous recombination, i.e., these would be genes acquired within the zone of paralogy. Of course, not all clothing is useful to all individuals, nor necessarily has the potential to be worn by a given individual. These latter items then would represent genes that in all likelihood extend beyond the zone of paralogy.

Functional Redundancy

The expansion of gene families by duplication and divergence of single genes within a single genome is an old idea, yet fraught with difficulty. Foremost among the difficulties is the problem of maintaining selection on both copies, thus preventing loss of the duplicated gene, until each gene develops functionally distinct roles. While clever schemes have been devised to circumvent these problems…, differential function may arise while genes reside in different cytoplasms and experience different selective constraints. HGT would then reunite previous orthologues in the same genome, where they would appear as paralogues [a.k.a., "xenologs"]; this process alleviates the need for a period of co-existence of multiple copies of the same gene without selection for differential function… Therefore, one must consider carefully the mechanisms by which 'gene genesis' … occurs. Is HGT also playing a role here? — Jeffery G. Lawrence & Heather Hendrickson (2003)

The concept of Zone of Paralogy rests on a combination of likelihoods of homologous versus illegitimate recombination, on the one hand, and that of functional redundancy on the other. Functional redundancy basically means that one adaptation is equivalent to another adaptation in terms of its contribution to organismal fitness, such that one adaptation in the presence of the other is superfluous. The simplest way to think about such adaptations is in terms of individual alleles, and the simplest such situation occurs upon gene duplication such that a haploid genome possesses two identical and identically functional copies of the same gene. In this situation, unless there is utility to having multiple copies of a gene such as for the sake of gene dosage as seen with rRNA genes, then selection is at best neutral acting on one of the copies of the gene. If the genes truly are equivalent, then which gene is beneficial with regard to selection and which is neutral is arbitrary. A single mutation in either gene, however, will tend to be a neutral mutation in terms of the genome as a whole even if a locus' function is eliminated, as so too will any additional mutations in the now knocked out gene even if, or especially if those mutations result in further genetic degradation. Indeed, what is created is a pseudogene, and this is the result of a key constraint on genome evolution by gene duplication: an absence of selection for continued presence of both genes.

Functional redundancy can be seen at levels of organization that are present above the level of the gene. Indeed, a common area of functional redundancy is seen when organisms invade new niches, such as when going from free-living to symbiotic lifestyles. In a symbiont's new environment many thing that were unavailable or difficult to control in the environment where they were free living may now be available or well controlled. The symbiont now may no longer need to synthesize various factors, to engulf and digest food, to regulate its internal chemistry in a manner that perturbs it away from its external chemistry, etc. That is, many internal metabolic processes may now be redundant to external conditions or resources. The genes responsible for effecting associated functions in the symbiont thus become redundant and mutations affecting their function, in the symbiont, at worse are likely to be neutral, and may even be beneficial to the extent that knocking out gene functions reduces metabolic burdens. The result is a loss of redundancy which we perceive as a loss of functionality in the symbiont. These losses in turn preclude reacquisition of a free-living lifestyle. Indeed, under certain circumstances even genes that are beneficial to the symbiotic lifestyle may be lost, as considered under the heading of genome erosion.

Genome Erosion

Another consideration is gene loss through deletion, which must occur since otherwise organisms could not continue to acquire genes through evolutionary time while at the same time not have increasingly expanding genome sizes. Indeed, it seems reasonable to envisage an equilibrium, of sorts, where genetic acquisitions (i.e., insertion events) are balanced by genetic deletion. What then would drive deletion? The answers, in an ultimate sense, are relatively limited: Genetic drift (resulting from small populations sizes along with little access to gene exchange, i.e., Muller's ratchet), selection for increased physiological streamlining (i.e., reduction in costs associated with otherwise carrying additional genes and gene sequences), and active selection against actively detrimental gene sequences. The latter, in fact, has been posited as a mechanism that drives bacterial tendencies toward streamlining: a higher than expected potential to delete genetic regions with that potential retained (possibly) for the sake of deleting detrimental gene sequences, such as those associated with prophages (Lawrence et al., 2001) .

This same propensity could just as easily be framed as a means of limiting ongoing genome expansion (Isambert and Stein, 2009) , however, and indeed that might represent simply a failure of the deleting organism to maximize some sort of replicative fidelity. Generally, then, bacteria will tend to accumulate new genetic sequence especially when possessing such sequences is advantageous to the bacterium, even if only conditionally or temporarily advantageous, whereas bacteria will tend to lose sequences when selection either is not sufficiently strong to retain them or, alternatively, when selection tends to favor those individuals that have deleted genes. We'll consider genome erosion again in the chapter titled, Virulence.

Note that in terms of the microevolutionary processes directly involved in gene loss, typically it is drift or natural selection that gives rise to the fixation of lost "alleles" within populations, that is, deletions. Drift can serve as a default explanation for this fixation if the genetic material lost otherwise does not impact organism fitness. Thus, mutations that are detrimental to gene function accumulate in these genes because natural selection is powerless to prevent such accumulation (i.e., though detrimental to the gene's function, they are effectively neutral with regard to natural selection). Genes thus are either deleted or, instead, become pseudogenes prior to their deletional loss. Since these mechanisms, as so described, are driven by stochastic processes, that is, by genetic drift, they are inefficient. Natural selection can more efficiently reduce genome sizes, though with significant limitations. The first limitation is that though in aggregate the loss of unused gene sequences may provide significant selective benefits, the loss of individual genes should provide much less benefit but nonetheless the variation upon which natural selection acts is primarily found in such isolated changes – individual mutations – rather than aggregates of such mutations. Since natural selection cannot anticipate the long-term utility of losing DNA, DNA is not efficiently lost even via natural selection.

Additional issues include that it is only via deletions that DNA is actually lost, thereby providing potential metabolic gains in terms of requirements for DNA replication and maintenance. Many mutations that are detrimental to gene function are not deletions, however, resulting in no loss of DNA burdens. Further, deletion mutations themselves are not without cost to organisms since it is not only detrimental alleles that may be randomly lost to deletion.

Natural selection can act fairly efficiently if a gene's function has detrimental impacts on organism fitness while the same gene otherwise provides little or no utility in return. One such detriment occurs when genes are transcribed and their transcriptional products are subsequently translated. Mutation events that have the effect of reducing this metabolic burden should be fairly strongly selected (i.e., via positive/directional selection) if the resulting gene product otherwise is of little use. The kinds of mutations that can give rise to transcriptional or translational down-regulation, however, can be relatively rare since transcriptional or translational control sequences are relatively minor constituents of individual genes, plus gene expression can be controlled in associated with that of other genes in prokaryotic systems. The result is that gene deletion events may be relatively less rare in comparison, resulting in loss of genes from genomes not so much because those genes are costly to possess but instead because they can be otherwise difficult to turn off transcriptionally. See Lawrence (2001) for further discussion of why bacteria tend to delete underutilize genes.