In conclusion, the proposed model provides a simple and coherent scheme for the development
of a coding system. It differs from previous mod- els in that it emphasizes the importance of physic-
ochemical constraints and initial conditions, to delimit the possible developmental pathways.
2. The model
The model starts from the following established fact, the genetic code is the biochemical system
for gene expression. Therefore, the genetic code is both, a physico-chemical and a communication
system. On the physical side, molecular recogni- tion depends on complementary molecular sur-
faces by means of weak interactions; on the informational side, a prerequisite to define a code
is the concept of distinguishability. Both aspects of the code are equally important to understand
its structure and evolution.
As it was mentioned above, in previous publica- tions we have shown that the structure of the code
the relationship between codons may be repre- sented as a six-dimensional Boolean hypercube
Jime´nez-Montan˜o et al., 1995, 1996. Accord- ingly, each base is determined by two independent
dichotomic variables, chemical type purine – pyrimidine and H-bonding weak – strong. Each
codon corresponds to a node in the cube, and it is next to six nodes representing codons differing in
a single property. Therefore, the hypercube simul- taneously represents the whole set of codons and
the corresponding amino acids and termination signal and keeps track of which codons are one-
bit neighbors of each other. See Figs. 2 – 4 in Jime´nez-Montan˜o et al., 1996.
The distribution of redundancy depends on the local symmetries of different codons, with respect
to the H-bonding categorization of the bases Jime´nez-Montan˜o et al., 1996; Zhang, 1997. For
example, the code separates into two almost iden- tical codes, with 32 codons each, according to
Hydrogen-bonding of the third base, NNW and NNS where W: A, U are weak and S: C, G
strong bases. The symmetry is complete if al- lowance is made for two codon reassignments,
AUA: I \ M; UGA STOP \ W, both of which have been observed in mitochondria. These
symmetries, in turn, have their physical origin in the codon – anticodon Gibbs free-energy of inter-
action. Therefore, the two aspects of the code converge: it is the physical indistinguishability of
some codon – anticodon interaction energies that makes the codons synonymous, and the code
degenerate and redundant. This conclusion is sup- ported by thermodynamical measurements made
by Klump and Maeder 1991. The thermody- namic approach to explain the origin of the distri-
bution of the redundancy in the genetic code has the advantage of being independent of micro-
scopic assumptions.
The explanations based on wobble rules Crick, 1966 imply the existence of modified bases,
which, in turn, require the existence of specific enzymes. Modified bases have an important effect
on the codon – anticodon interaction, for example, pseudouridine has a very strong stabilizing effect
on double-stranded, base pairing interactions when the modification is located within a base-
paired region Davies et al., 1998. However, all fine-tuning effects most probably are later refine-
ments of the translation apparatus and, for this reason, they presumably did not play any role in
primordial codes. The discovery of ‘four-way’ wobble Jukes, 1990 in mitochondria led to revi-
sion of the wobble rules Heckman et al., 1980.
Already Goldberg and Wittes 1966 noticed that ‘‘for the different codon sets the base compo-
sition and the extent of degeneracy are closely related’’. Furthermore, that codon sets with maxi-
mum GC content are four-times degenerate, all those lacking GC are twice degenerate, and the
sets with an intermediate GC content have an average degeneracy of about three. From these
observations they made the following important remark, ‘‘the basis of this correlation between GC
content and degeneracy is not clear, but a crucial factor is probably the additional hydrogen bond
linking GC pairs, as compared with AT pairs. The importance of GC content for thermal stability of
the DNA helix is well known. Possibly the GC content of the triplet set may actually determine
the degeneracy, the greater the affinity of GC pairs obviating reading of the third nucleotide in
those triplets containing only GC in the first two places’’.
This conjecture was investigated experimentally by Langerkvist’s group Langerkvist, 1978; Sa-
muelsson et al., 1980, that postulated a ‘‘two out of three reading’’. Under the conditions of in vitro
protein synthesis, a codon can be read by recogni- tion of only its first two nucleotides, the third
position of the codon being disregarded. These authors proved their hypothesis only for codons
of the SSN class. Jime´nez-Montan˜o et al. 1996 suggested a generalization of the hypothesis,
based on the group-theoretical analysis of codon doublets made by Danckwerts and Neubert
1975. The main result was a classification of the codons of ‘mixed type’ class WSN and SWN,
with respect to the sets M
1
and M
2
of four-fold and less than four-fold degenerate doublets, re-
spectively. It was shown in that paper that the third base degeneracy of a codon does not depend
on the exact base at the first position, but only of its H-bond character. Also Hasegawa and Miyata
1980 underlined the importance of the codon – anticodon interaction energy to understand the
pattern of degeneracy. These authors noticed a strong correlation between codon composition
and molecular weight of the coded amino acid. The further correlation, between molecular weight
and the sizecomplexity score employed here Table 1, has been fully discussed by Dufton
1997. Thus, our results extend their previous finding.
As already mentioned, the structure of the code suggests that it evolved following a course of
minimal differentiation to diversify objects. In the context of the formalism we are employing, this
means by changing a single distinctive feature of the codon at each time. From this assumption a
dynamical evolutionary pattern of the code emerges naturally, envisioned as a refinement
process.
3. The group-theoretical foundation