The model Directory UMM :Data Elmu:jurnal:B:Biosystems:Vol54.Issue1-2.1999:

In conclusion, the proposed model provides a simple and coherent scheme for the development of a coding system. It differs from previous mod- els in that it emphasizes the importance of physic- ochemical constraints and initial conditions, to delimit the possible developmental pathways.

2. The model

The model starts from the following established fact, the genetic code is the biochemical system for gene expression. Therefore, the genetic code is both, a physico-chemical and a communication system. On the physical side, molecular recogni- tion depends on complementary molecular sur- faces by means of weak interactions; on the informational side, a prerequisite to define a code is the concept of distinguishability. Both aspects of the code are equally important to understand its structure and evolution. As it was mentioned above, in previous publica- tions we have shown that the structure of the code the relationship between codons may be repre- sented as a six-dimensional Boolean hypercube Jime´nez-Montan˜o et al., 1995, 1996. Accord- ingly, each base is determined by two independent dichotomic variables, chemical type purine – pyrimidine and H-bonding weak – strong. Each codon corresponds to a node in the cube, and it is next to six nodes representing codons differing in a single property. Therefore, the hypercube simul- taneously represents the whole set of codons and the corresponding amino acids and termination signal and keeps track of which codons are one- bit neighbors of each other. See Figs. 2 – 4 in Jime´nez-Montan˜o et al., 1996. The distribution of redundancy depends on the local symmetries of different codons, with respect to the H-bonding categorization of the bases Jime´nez-Montan˜o et al., 1996; Zhang, 1997. For example, the code separates into two almost iden- tical codes, with 32 codons each, according to Hydrogen-bonding of the third base, NNW and NNS where W: A, U are weak and S: C, G strong bases. The symmetry is complete if al- lowance is made for two codon reassignments, AUA: I \ M; UGA STOP \ W, both of which have been observed in mitochondria. These symmetries, in turn, have their physical origin in the codon – anticodon Gibbs free-energy of inter- action. Therefore, the two aspects of the code converge: it is the physical indistinguishability of some codon – anticodon interaction energies that makes the codons synonymous, and the code degenerate and redundant. This conclusion is sup- ported by thermodynamical measurements made by Klump and Maeder 1991. The thermody- namic approach to explain the origin of the distri- bution of the redundancy in the genetic code has the advantage of being independent of micro- scopic assumptions. The explanations based on wobble rules Crick, 1966 imply the existence of modified bases, which, in turn, require the existence of specific enzymes. Modified bases have an important effect on the codon – anticodon interaction, for example, pseudouridine has a very strong stabilizing effect on double-stranded, base pairing interactions when the modification is located within a base- paired region Davies et al., 1998. However, all fine-tuning effects most probably are later refine- ments of the translation apparatus and, for this reason, they presumably did not play any role in primordial codes. The discovery of ‘four-way’ wobble Jukes, 1990 in mitochondria led to revi- sion of the wobble rules Heckman et al., 1980. Already Goldberg and Wittes 1966 noticed that ‘‘for the different codon sets the base compo- sition and the extent of degeneracy are closely related’’. Furthermore, that codon sets with maxi- mum GC content are four-times degenerate, all those lacking GC are twice degenerate, and the sets with an intermediate GC content have an average degeneracy of about three. From these observations they made the following important remark, ‘‘the basis of this correlation between GC content and degeneracy is not clear, but a crucial factor is probably the additional hydrogen bond linking GC pairs, as compared with AT pairs. The importance of GC content for thermal stability of the DNA helix is well known. Possibly the GC content of the triplet set may actually determine the degeneracy, the greater the affinity of GC pairs obviating reading of the third nucleotide in those triplets containing only GC in the first two places’’. This conjecture was investigated experimentally by Langerkvist’s group Langerkvist, 1978; Sa- muelsson et al., 1980, that postulated a ‘‘two out of three reading’’. Under the conditions of in vitro protein synthesis, a codon can be read by recogni- tion of only its first two nucleotides, the third position of the codon being disregarded. These authors proved their hypothesis only for codons of the SSN class. Jime´nez-Montan˜o et al. 1996 suggested a generalization of the hypothesis, based on the group-theoretical analysis of codon doublets made by Danckwerts and Neubert 1975. The main result was a classification of the codons of ‘mixed type’ class WSN and SWN, with respect to the sets M 1 and M 2 of four-fold and less than four-fold degenerate doublets, re- spectively. It was shown in that paper that the third base degeneracy of a codon does not depend on the exact base at the first position, but only of its H-bond character. Also Hasegawa and Miyata 1980 underlined the importance of the codon – anticodon interaction energy to understand the pattern of degeneracy. These authors noticed a strong correlation between codon composition and molecular weight of the coded amino acid. The further correlation, between molecular weight and the sizecomplexity score employed here Table 1, has been fully discussed by Dufton 1997. Thus, our results extend their previous finding. As already mentioned, the structure of the code suggests that it evolved following a course of minimal differentiation to diversify objects. In the context of the formalism we are employing, this means by changing a single distinctive feature of the codon at each time. From this assumption a dynamical evolutionary pattern of the code emerges naturally, envisioned as a refinement process.

3. The group-theoretical foundation