Mutations come in a relatively large number of varieties (Table, below). They can be changes in individual nucleotides, or instead modifications of multiple nucleotides. They may or may not be products of chemical modification of bases or, alternatively, can be a consequence of errors in recombination. They can represent completely random changes to sequences or instead can be a consequence of processes that are less random, and they may be described as so-called transversions versus transitions. Mutations may or may not impact gene reading frames, i.e., as read by ribosomes, or may not even involve reading frames. They also can be detrimental, neutral, or even beneficial in their impact on organism fitness, that is, the potential for bearers of specific mutations ultimately to reproduce. Here I provide a brief introduction especially to the molecular nature of mutational products, as considered from the perspective of modification of genotype information content. This perspective is rather than in terms of where mutations come from, how they may be repaired, or how they may impact organism-level phenotype.
Individual DNA nucleotides comes in four varieties. These varieties are distinguished in terms of their associated nitrogenous bases: guanine (G), adenine (A), cytosine, (C), and thymine (T). In RNA, thymine is replaced by uracil (U). Both guanine and adenine are purines, which are double-ringed nitrogenous bases. C, T, and U, by contrast, are pyrimidines, which are single-ringed nitrogenous bases. Together a purine-pyrimidine pair, i.e., either C=G or T=A (or U=A in RNA), produce a rung in the nucleic acid double helix structure, one which has a proper width. (Note that the horizontal lines between bases, as found in the previous sentence, represent hydrogen bonds.)
It is easier for a purine to mutate into another purine (i.e., C → T, T → C, or equivalent with U for RNA), or for a pyrimidine to mutate into another pyrimidine (i.e., A → G or G → A), than it is for a purine to mutate to a pyrimidine (i.e., A or G → T or C ) or for a pyrimidine to mutate to a purine (i.e., T or C → A or G). The former, purine- to-purine or pyrimidine-to-pyrimidine conversions, are called transitions. By contrast, purine-to-pyrimidine conversions, or vice versa, are called transversions. Since generally, in terms of random mutation, transitions are more likely than transversions, for a single nucleotide, such as a T, the spectrum of possible mutations may be described as T ▬► C, T → G, or T → A, where the bolder arrow, in T → C, is intended to indicate that among possible mutations, a transition is more likely. More generally, simply in terms of the chemistry of individual nucleotide bases, some mutations are more likely than others. Mutations thus are random events though individual mutations nonetheless are not all equally probable. Indeed, in addition to differences in the probability of different types of mutations there also exist different likelihoods of mutations between different organisms, between different environments, and also between different regions of what otherwise is the same genome.
The terms transition and transversion refer to point mutations, that is, modifications of individual nucleotides within a genome. Another category of point mutations are single-base insertions and deletions, which represent the gain or loss of single nucleotides rather than a change in type. While transitions and transversions have only limited potential to modify more than a single codon during translation, i.e., polymerization of amino acids into proteins, single-base insertions or deletions can give rise to profound as well as highly disruptive modifications of the amino acid sequences of proteins. The latter occurs because nucleotides, in messenger RNA, are read by ribosomes in multiples of three, called codons, with no punctuation. Insertions, or deletions, as a consequence can change literally the reading frame of all downstream codons, that is, resulting in what is known as a frameshift mutation.
Individual transitions and transversions can modify only a single nucleotide of the three making up a codon. As a consequence, they have much less potential to disrupt the subsequent (downstream) codons that define a gene's reading frame. Single-base deletions and insertions, by contrast, can modify what previously was a three-nucleotide codon instead into nucleotides strings of length two or four. The result is a shift in the downstream reading frame, as read by the ribosome, by one nucleotide. The result is modification of all downstream codons and therefore all downstream amino acids placed by the ribosome into the resulting polypeptide/protein. This effect is equivalent to shifting the starting letters of words in sentences such that, for example, "See her run fast" is converted to "Seh err unf ast". That is, more often than not, to gibberish.
Insertions or deletions within the reading frames of genes can be highly disruptive, and are described as frameshift mutations. Exceptional, however, is when insertions or deletions are made in multiples of three nucleotides. In this case there still is local disruption but much less downstream loss of information. Alternatively, if insertions or deletions are not found within the reading frames of genes, then the multiple of nucleotides involved can be much less important.
Insertions or deletions also can be much larger than one or a few nucleotides, encompassing in some cases multiple genes. The loss of genes with deletions can be tolerated only if the functions involved are either redundant or otherwise unnecessary under specific circumstances (i.e., so-called deletable regions of genomes). The gain of one or multiple genes can result in additional metabolic burden, in terms of genome replication, transcription, or translation loads on organisms, or simply in new protein structures and functions being present within cells that can interfere with existing proteins and functions; equivalently, gene losses due to deletions can serve to relieve metabolic burdens associated with carrying as well as expressing those genes. (By metabolic burden, I mean physiological costs to an organism where the organism, for example, must expend more energy due to the presence or expression of a given gene or gene variant than would be the case absent that gene or gene variant, though alternatively genes or gene variants exist that can have the effect of lowering metabolic burdens and/or which might compensate for whatever burden is imposed, either directly or indirectly.) The molecular mechanism by which larger insertions can occur typically is due to some form of molecular recombination, either homologous recombination of DNA in which an insertion is already present or recombination involving little or no homology (so-called micro-homologous recombination or illegitimate recombination). So too, large deletions can stem from recombinative processes.
Extending these analogies, just as deletions represent a loss of genetic material along with potentially associated losses of organism functions, insertions may provide instead additional functions to organisms. These additional functions, however, must be accessory functions – just as deletable regions or functions by definition must be not be essential to an organism under selecting conditions – since presumably a receiving, viable organism would already possess all of the genes it needs in order to survive within its normal environmental range. Thus, mutations involve either the gain or loss of information, and most likely the latter, while in either case ongoing organism functionality requires that those changes not result in the loss of "essential" functions (with the quotes here indicating that this is by definition, that is, that "essential" is being defined as something that viability is dependent upon).
Whether or not a mutation will impact the phenotype of an organism will depend upon the specific nature of the mutation as well as its location within an organism's genome. The susceptibility of a specific metabolic process to disruption by mutation in turn will be a function of the aggregate target sizes of the genes involved as well as the robustness of the process to disruption (robustness to mutation also can be described as low functional constraint on the part of processes or underlying molecular machinery, such as in terms of protein function—change thus can occur but without substantially modifying functionality). The characteristics of the encoding genes also will contribute to the potential for a mutation to improve a function. The likelihood that such positive changes may occur can be viewed roughly as a combination of the potential for a mutation to occur within a specific gene or genes (i.e., target size) and the potential for the mutation to modify the gene (or genes) without grossly disrupting either the chemistry or the conformation of resulting molecule or molecules (typically proteins). Mutations also can impact gene transcription promoters, gene reading frames, or a gene's ribosome binding sites, the latter as found in resulting messenger RNA.
The degeneracy of codons, i.e., the fact that many amino acids are encoded by more than one codon, has the consequence that many mutations – termed, synonymous substitutions, below – have no impact on amino acid sequence, that is, on protein primary structure (codon degeneracy, it should be noted, is the same thing as genetic code redundancy). Diploidy, polyploidy, and other forms of genetic redundancy also limits the phenotypic impact of many mutations, especially those that destroy function, e.g., thus forming recessive alleles. By contrast, mutations in these genetically redundant systems that either give rise to or restore new functions may be described as producing dominant alleles. Consideration of the dominance of alleles is much less relevant in microbial systems, however, since most of the organisms are haploid.
Mutations such as that give rise to synonymous substitutions, i.e., ones that give rise to no change at least in the structure or production of gene products, can be described as silent. In addition to synonymous substitutions, there are nucleotide-substitution mutations that lead to amino acid changes. These are termed missense mutations (and which in turn give rise to nonsynonymous substitutions). Alternatively, substitution mutations can result in the exchange of an amino acid-encoding codon with one that codes instead for translation termination (i.e., a stop codon). These are described as nonsense mutations, which can result in significant protein disruption through polypeptide truncation. Frameshift mutations, too, can result in polypeptide truncation due to the presence of stop codons within previously alternative reading frames.
Nonsynonymous substitutions are mutations that give rise to a change in the amino acid encoded by a codon. Nonsynonymous substitutions within populations are often taken as an indication of positive selection, that is, changes in genomes that are beneficial and therefore which tend to increase in frequency within populations. This assumption occurs because even though random changes in information are unlikely to be beneficial, only those changes that impact phenotype can possibly be beneficial, and mutations that result in changes to the amino acid sequence of polypeptides are more likely to impact phenotype than mutations that do not result in changes to the amino acid sequence of polypeptides.
An important means by which a signature of positive selection is detected within genomes is by comparing the ratio of nonsynonymous to synonymous substitutions (dN/dS) where nonsynonymous substitutions are assumed to be beneficial whereas synonymous substitutions are assumed to be neutral, that is, supplying neither a benefit nor a cost to an organism (below). These assumptions of course are oversimplifications since nonsynonymous substitutions can be neutral whereas synonymous substitutions can be beneficial. Nevertheless, on average nonsynonymous substitutions are taken as an indication that a population has been subject to positive selection while synonymous substitutions provide an indication of overall mutation or mutation-fixation rates within populations. Greater numbers of synonymous substitutions, contrasting greater numbers of nonsynonymous substitutions, are taken as a signature of genetic drift.