Given the idea of a core set of genes, one can differentiate genes into three or perhaps four categories, and even provide a biochemical and/or genetic basis for the categorization. That is, the core genes, the first category, would give rise to products that interact with a large number of other gene products within a cell. Modification of one of these genes, such as through horizontal gene transfer, would result in the disruption of a large number of interactions, and therefore would be less likely to be immediately beneficial (re: the complexity hypothesis). Presumably ssu rRNA genes fall into this category, since, as RNA cores to the large RNA-protein ribosome complexes, their gene products interact physically with the gene products of multiple additional genes products, i.e., ribosomal proteins as well as other ribosomal rRNAs.
Interestingly, the phylogeny of core genes need not be equivalent to a "true phylogeny" for an organismal phylogeny to still exist. Instead, they can also exist as horizontally transferred gene clusters that produce physically interacting gene products, e.g., such as one sees in the genomes of tailed bacteriophages (i.e., the capsid genes tend to cluster in these genomes). In both cases what is being described is a form of linkage disequilibrium that is driven by epistatic interactions. Indeed, one can go further to argue that the very idea of a species, at least in terms of the biological species concept, is more or less synonymous with the idea of genetic linkage of coadapted, e.g., or i.e., epistatically interacting genes, whether those genes are found as a core component of whole organisms, or instead as core components of horizontally transferable gene clusters.
The second category of genes in this discussion are ones that are found within an organism but for which epistatic interactions, perhaps especially physical interaction with other gene products (particularly protein-protein interactions), is less pronounced. These are the genes for which gene phylogenies can be readily constructed but nonetheless which are less well correlated with organismal phylogenies than are core gene physiologies. That is, these are the genes that are more likely to be swapped in the course of horizontal gene transfer, i.e., which are more likely subject to orthologous replacement.
The third category of genes are those that may be swapped into organisms and which are retained because they provide a novel and useful function. The latter can include clusters of epistatically interacting genes or genes provided by plasmids. Specifically, these are genes that are not acquired via orthologous replacement but instead upon acquisition represent new loci rather than simply new alleles.
A perhaps fourth category might also be considered, i.e., those genes that are retained because they are linked with newly acquired genes found in the proposed third category, above. Among these latter genes we might include genes that are carried by infectious or parasitic genetic elements such as plasmids or latently infecting viruses. Together these latter categories, i.e., three and four, can be described as accessory genes or the accessory genome. Note that this fourth category perhaps can be broadened to include all genes that do not supply a selective benefic to the organism carrier and therefore which can be thought of as hitchhiking alleles in a periodic selection sense.
Yet another way of thinking about the genes found within organisms, particularly as applied to bacteria, is the concept of the pan-genome (a.k.a., species genome or supragenome). The pan-genome consists of all of the genes that are found in association with a bacterial species. Leaving aside the problems of defining just what a bacterial species is, this superset of genes includes a combination of the various genes indicated above with obviously increasing diversity as one ranges from core genes to those genes which are in the process of degradation due to lack of use. The ideas expressed earlier in this section, however, tell a different story from that of pan-genomes. Specifically, a pan-genome provides a quantitative way of thinking about what gene types are found in different bacterial types rather than explicitly which are undergoing noticeable and/or difficult to detect gene exchange events. Thus, the utility of thinking about genes in terms of a pan-genomes is the potential to think in terms of their frequency across species. Extended core genes in this case have functions that are present in something approaching 100% of the members of the species whereas character genes provide the unique phenotypic characteristics associated with individual groupings of bacteria such as individual species. Accessory genes are found with much reduced prevalence and are typically found in association with prophages, though more generally one perhaps can think of these as products of illegitimate recombination (category 3, above). Lastly, genes will exist that are present at very low levels within species, though still present, and perhaps in only a single sequenced strain. The latter, hypothetically, may include those genes that do not contribute positively to bacterial fitness and therefore may be present in genomes only somewhat transiently (e.g., such as category 4, hitchhiking genes). Importantly, though, there is only incomplete overlap between these pan-genome ideas and those that try to consider horizontal gene transfers events more explicitly. Note also the idea of a "hologenome" as discussed by Zilber-Rosenberg and Rosenburg (2008) : "The hologenome is defined as the sum of the genetic information of the host and its microbiota."
Table: Combining Consideration of Genomes with that of Horizontal Gene Transfer
|Horizontal gene transfer- based gene concept||Overlaps with…||Discussion|
|Core||Organismal phylogenies, organismal classification, subset of pan-genome extended core, survival of recombinants only given high homology, physical interaction with numerous additional gene products, epistatic interactions, biological species concept as applied to bacteria, ssu RNA genes, equivalent may not exist in phages or other viruses||May be relatively rare within bacterial genomes; recombination may not be that rare, but recognizing such recombination may be; replacement is conservative just as we see with sex in obligately sexual organisms; to some extent equivalent on an organismal level to co-adapted gene parts or co-adapted genes within metabolic pathways, which are much more easily acquired, swapped, or retained as wholes rather than as parts|
|Orthologously replaceable||Different evolutionarily histories from core genes, homologous recombination but does not require as high homology, subject to identifiable orthologous replacement events, few or no physical interactions with other gene products, possibly fewer epistatic interactions, character genes but also potentially included among pan-genome extended core genes, mosaic evolution (particularly noticeable in phages and other viruses but also as seen in bacteria)||Replacement declines with reductions in homology because of a combination of reduced likelihood of homologous recombination and reduced likelihood of benefit relative to replaced gene once acquired; these genes along with the core genes together make up those genes that are consistently present within a bacterial species (i.e., together these are the pan-genome core genes); these are the bulk of the genes found in phage genomes (with the caveat that many phages genes are transferred between phages as co-adapted groups rather than as individual genes)|
|New, beneficial loci||Different evolutionary histories from both core and orthologously replaceable genes, accessory genes, products of illegitimate recombination, zone of paralogy, prophage genes, plasmid genes, in terms of gene functions these make up the bulk of differences between strains making up individual bacterial species, includes unique genes, mosaic evolution (particularly noticeable in bacteria but also as seen in phages and other viruses)||These are a subset of the genes that belong to gene families that differ between different strains of the same bacterial species (that is, the beneficial subset with the rest of this subset representing hitchhiking genes); the non-core genes in the pan-genome of a given species; these genes make up the bulk of the rest of phage genomes, i.e., in addition to the orthologously replaceable genes|
|Hitchhiking||Pseudogenes, genome erosion, unique genes, genes acquired along with beneficial loci but which are not themselves beneficial, prophage genes that contribute to phage fitness but not also that of lysogens, parasitic (selfish) DNA, no longer useful genes, products of gene duplication events (paralogs), products of perhaps most loci-acquisition events as viewed prior to exposure to natural selection, maintained by drift||These genes do not contribute to the fitness of their carriers in the sense that their deletion will have no (or very little) negative fitness consequences; these genes are retained due to their linkage to beneficial alleles, just as one sees with periodic selection and hitchhiking in asexual organisms (though in this latter example the genes do not necessarily have zero fitness impact)|