BioSystems 54 1999 47 – 64
Protein evolution drives the evolution of the genetic code and vice versa
Miguel A. Jime´nez-Montan˜o
a,b,
a
Inno6ationskolleg Theoretische Biologie, Humboldt-Uni6ersita¨t zu Berlin, In6alidenstrasse
43
, D-
10115
Berlin, Germany
b
Departamento de Fı´sica y Matema´ticas, Uni6ersidad de las Ame´ricasPuebla, Sta. Catarina Ma´rtir,
72820
Puebla, Mexico Received 30 March 1999; received in revised form 9 July 1999; accepted 12 July 1999
Abstract
A model for the developmental pathway of the genetic code, grounded on group theory and the thermodynamics of codon – anticodon interaction is presented. At variance with previous models, it takes into account not only the
optimization with respect to amino acid attributes but, also physicochemical constraints and initial conditions. A ‘simple-first’ rule is introduced after ranking the amino acids with respect to two current measures of chemical
complexity. It is shown that a primeval code of only seven amino acids is enough to build functional proteins. It is assumed that these proteins drive the further expansion of the code. The proposed primeval code is compared with
surrogate codes randomly generated and with another proposal for primeval code found in the literature. The departures from the ‘universal’ code, observed in many organisms and cellular compartments, fit naturally in the
proposed evolutionary scheme. A strong correlation is found between, on one side, the two classes of aminoacyl- tRNA synthetases, and on the other, the amino acids grouped by end-atom-type and by codon type. An inverse of
Davydov’s rules, to associate the amino acid end atoms ON and non-Onon-N of 18 amino acids with codons containing a weak base AU, extended to the 20 amino acids, is derived. © 1999 Elsevier Science Ireland Ltd. All
rights reserved.
Keywords
:
Protein evolution; Developmental pathway; Primeval code; Codon reassignments; Codon – anticodon interaction; Aminoacyl-tRNA synthetases
www.elsevier.comlocatebiosystems
‘‘It is not the number of available signals but rather their distinguishability that matters in
communication’’ Schumacher 1991.
1. Introduction
The problem of the origin and evolution of protein synthesis constitutes one of the major
transitions in evolution, which is far from being solved at the present time. In the sensibly and
acute statement of Smith and Szathma´ry 1995, ‘‘The origin of the code is perhaps the most
Tel.: + 52-28-29-2676; fax: + 52-28-29-2045. E-mail address
:
jimmmail.udlap.mx M.A. Jime´nez-Mon- tan˜o
0303-264799 - see front matter © 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 3 0 3 - 2 6 4 7 9 9 0 0 0 5 8 - 1
perplexing problem in evolutionary biology. The existing translational machinery is at the same
time so complex, so universal, and so essential that it is hard to see how it could have come into
existence, or how life could have existed without it’’.
Several authors have approached this problem by building possible scenarios in which the genetic
code could have originated. See the book by Smith and Szathma´ry 1995 and the fast growing
literature on the RNA world Gesteland and Atkins, 1993. Most of these models are con-
cerned with the catalytic properties of ribozymes. These are RNA molecules, assumed to play the
role of enzymes in a primordial self-replicating system from which, it is presumed, the modern
translational machinery originated. Another ap- proach is the search for ancestors of transfer-
RNA Eigen et al., 1989; Rodin et al., 1996.
The regularities of the codon catalogue were recognized from the very beginning Sonneborn,
1965; Epstein, 1966; Goldberg and Wittes, 1966; Woese, 1967; Alff-Steinberg, 1969. However, the
use of a not completely appropriate mathematical framework for their description had been the
main drawback to obtain an understanding of the code’s possible evolution Karasev and Sorokin,
1997. The customary expression ‘organization of the code’ really refers to two different problems,
i the distribution of redundancy in the code, i.e. the ‘block structure’ of synonymous codons
and the positions of the three stop codons Gold- man, 1993, and ii the amino acid assign-
ments. The answer to the first question is indepen- dent of the answer to the second one. It depends
on
the codon – anticodon
interaction energy
Jime´nez-Montan˜o, 1994; Jime´nez-Montan˜o et al. 1995
1
. The standard approach to these problems em-
ploys a three-dimensional sequence space of codons, equipped with a Hamming distance the
number of positions where the nucleotides differ in a codon pair. Because, implicitly or explicitly,
it is assumed that ‘minimum change’ is synony- mous of single-nucleotide-change Goldberg and
Wittes, 1966. Further, in a recent paper Xia and Li, 1998 the authors say that ‘‘it has long been
proposed that the genetic code might have been arranged in such a way as to reduce the effect of
non-synonymous mutations involving single nu- cleotide changes’’. This approach is inexact, be-
cause not all single nucleotide changes are equivalent, mutational transitions and transver-
sions complementary and non-complementary are not equal, and also because the three positions
in a codon have different thermodynamic stability and mutation frequency.
An appropriate mathematical framework to represent the code, as a six-dimensional Boolean
hypercube, was first proposed by Jime´nez-Mon- tan˜o and De la Mora-Basan˜ez 1992. This result
was obtained building on pioneering algebraic approaches by Danckwerts and Neubert 1975,
Bertman and Jungck 1979, Swannson 1984. The related early work of Rumer 1968 was
unknown to the authors. More complete presenta- tions of the formalism appeared in later publica-
tions Jime´nez-Montan˜o et al., 1995, 1996. Independently,
Klump 1993,
Karasev and
Sorokin 1997 made similar proposals, which il- luminate different aspects of the problem. A com-
parison of the three geometrical representations of the code will be discussed in a forthcoming publi-
cation Jime´nez-Montan˜o and Klump, in prepara- tion. In the present contribution I further
develop this approach, grounded on thermody- namics and group theory, and propose a model
for the evolution of the code.
Besides the precursor – product relationships be- tween amino acids or nucleotides in biosynthetic
pathways Wong, 1975, 1976; Dillon, 1978; Tay- lor and Coates, 1989; Jime´nez-Sa´nchez, 1995, the
only evidence we have from the time the code originated are ‘molecular fossils’. Among these, is
the structure of the ‘universal’ code itself Woese, 1965, 1967; Taylor and Coates, 1989; Jime´nez-
1
It is important to notice, that in the present contribution we refer to the codon – anticodon interaction only in the
strictest sense of the concept Langerkvist, 1978. In a wider perspective, it is well known that other factors such as the
conformation of the whole tRNA molecule are of great impor- tance for the specificity of codon – anticodon recognition Kur-
land et al., 1975.
Montan˜o, 1994. Now supplemented with well- documented deviations Jukes, 1990; Wolsten-
holme, 1992, which are far from random Schultz and Yarus, 1996. Therefore, any theory to infer
prior codes should be consistent with this extant evidence. As we shall see, our model satisfies this
requirement. Moreover, by its very nature, our phenomenological approach has the advantage of
circumventing the difficulties that have plagued the proposals on the origin of the genetic code
Cedegren and Miramontes, 1997.
As Eigen 1971 first pointed out, the origin of the code should have been the result of a highly
non-linear selection process. Therefore, it was strongly dependent on initial conditions, but ini-
tial conditions cannot be derived from dynamics of any kind. It is an essential characteristic of the
evolutionary process to involve a certain degree of contingency, at one or more points. However, in
our model the initial conditions are not assumed on the basis of the abundance of pre-biotically
synthesized amino acids, nor on precursor – product relations in the biosynthetic pathways of
pyrimidines Jime´nez-Sa´nchez, 1995. This is so, because I do not appeal to prebiotic scenarios.
From the point of view developed here, the ques- tion of the initial conditions is not the question of
which ingredients appeared first, but the question of which were the first amino acids to be incorpo-
rated into a primordial code. I assume as others did before, that ‘simple’ amino acids were intro-
duced first. To have a comparison criterion, the amino acids are ranked according to two current
measures to estimate chemical complexity, i the shortest-description of structural formulas Pa-
pentin, 1982, and ii the sizecomplexity score Dufton, 1997, see Table 1.
The structure of the code suggests that it evolved following a minimum change coding
pathway, The development of an already-working system should happen by changes in its least
significant features, without disturbing the major lines of the system Swannson, 1984. Apart from
this general assumption, the only additional as- sumption I make is that the codon – anticodon
Gibbs free-energy of interaction induces a partial order see Eq. 1 below in the set of codon
classes. This partial order defines a ‘time arrow’, in the sense that specific classes e.g. NRN, CGY,
etc. correspond to the various stages of the pro- gressive differentiation of the code. These codon
categories define amino acid groups, the amino acids belonging to a given category are the leaves
from the node of that category in the develop- mental tree Fig. 1. This approach gives a formal
expression to pioneering ideas of Woese 1965, 1973 and coworkers Woese et al., 1966, that
envisioned a gradual development from a ‘sim- plest’ code. According to these authors, the first
code was so imprecise because the ancestors of transfer RNAs were only able to recognize classes
of similar codons an extreme form of wobble and classes of similar amino acids. As aptly sum-
marized by Haig and Hurst 1991, ‘‘in this view, the modern version of the code evolved through a
gradual increase in the discrimination of tRNA
Table 1 Ranking of amino acids in isolation and in side-chains of
proteins, according to two measures to estimate chemical complexity
Shortest descrip- Sizecomplexity
Shortest descrip- tion of amino
score of amino tion of side-chain
a
acid
b
acid
a
G
c
G
c
G
c
A
c
A
c
A
c
D
c
V
c
S P
L –
C S
I S
V C
T N
K M
E K
P K
T D
c
L V
N D
c
Q T
E I
M N
Q L
F E
I R
Q H
Y R
H C
F R
F H
M Y
Y W
W W
a
Papentin 1982.
b
Dufton 1997.
c
The assumed four amino acids that first entered the code.
Fig. 1. Proposed developmental pathway of the genetic code. a Starting partition into YR branches. b Pyrimidine branch. c Purine branch. In the first column appear amino acid reassignments in non standard codes. The second and third columns show
reassignments from randomly generated codes. The number of codons are in brackets. 1 Amirnovin 1997; 2 Goldman 1993.
Fig. 1. Continued
for specific amino acids and specific codons within these ancestral sets’’.
By comparing a primeval code, coding only for seven amino acids, with the five letter alphabet
employed by Riddle et al. 1997 to build a func- tional protein, it is shown that our proposed
primeval code is enough to enable a primitive cell to produce functional proteins. These early
proteins, in turn, drive the further evolution of the code. Similar simplified alphabets are obtained
from randomly generated codes, under different optimization criteria, only after imposing the ini-
tial conditions assumed in the model. However, a diverse outcome is attained with the primeval
code proposed by Jime´nez-Sa´nchez 1995, that assumes radically different initial conditions.
It is shown that the developmental pathway of the genetic code is compatible with proposals
about the co-evolution of the code and amino acid synthetic pathways Wong, 1975; Dillon,
1978; Taylor and Coates, 1989, without invoking non-testable assumptions about the temporal ap-
pearance of nucleotides or amino acids. The ob- served departures from the ‘universal’ code, found
in many organisms and cellular compartments Jukes, 1990; Wolstenholme, 1992, fit naturally in
the proposed evolutionary scheme. They can be explained in a way consistent with the ‘‘ambigu-
ous intermediate’’ theory of Schultz and Yarus 1996.
According to our model, not only genomic evolution drives the evolution of the translation
system Andersson and Kurland, 1995, but, the converse is also true, the evolution of the transla-
tion system drives genomic evolution. From this point of view, both, the non-random nature of
codon reassignments they mainly occur between codons differing in one or two features, and the
differences in codon usage between prokaryotes and eukaryotes Klump and Maeder, 1991, reveal
the different strategies followed by ancestral cells and extant organisms see discussion.
In conclusion, the proposed model provides a simple and coherent scheme for the development
of a coding system. It differs from previous mod- els in that it emphasizes the importance of physic-
ochemical constraints and initial conditions, to delimit the possible developmental pathways.
2. The model