Introduction. Two Chargaff’s parity rules are well known in genetics. They are

The rules of long DNA‐sequences and tetra‐groups of oligonucleotides Sergey V. Petoukhov Head of Laboratory of Biomechanical System, Mechanical Engineering Research Institute of the Russian Academy of Sciences, Moscow spetoukhovgmail.com, http:petoukhov.com Comment: Some materials of this article were presented by the author in the keynote speech at the congress on energy and information medicine «Energiemedizin 2017» 10‐11 June 2017, Bad Soden, Germany, http:dgeim.deenergiemedizin‐kongress‐2017 and in the keynote speech at the international conference “Artificial Intelligence, Medical Engineering, Education” Moscow, Russia, 21‐23 August 2017, http:www.ruscnconf.orgaimee2017index.html . Abstract. The article is devoted to hidden symmetries in long sequences of oligonucleotides of single stranded DNA. The notions of tetra‐groups of oligonucleotides and also collective frequencies and collective probabilities of members of the tetra‐groups are introduced to study hidden tetra‐group regularities in mentioned sequences. Each of such tetra‐group contains 4 members, each of which combines 4 n‐1 oligonucleotides with the same length n in accordance with their certain attributes. Results of comparison analysis of collective probabilities of separate members of the tetra‐groups for a representative set of long nucleotide sequences are shown. These results give evidences in favor of existence of the suppositional tetra‐group rules of oligonucleotides in single stranded DNA in addition to the second Chargaff’s parity rule. An algebraic approach to model the described genetic phenomena is proposed. Key words. Chargaff’s rules, symmetry, long nucleotide sequence, tetra‐group of oligonucleotides, probabilities, tensor product.

1. Introduction. Two Chargaff’s parity rules are well known in genetics. They are

important because they point to a kind of grammar of biology these words were used by E.Chargaff in the title of his article [Chargaf, 1971]: a set of hidden rules that govern the structure of DNA. The first Chargaffs parity rule states that in any double‐stranded DNA segment, the number of occurrences or frequencies of adenine A and thymine T are equal, and so are frequencies of cytosine C and guanine G [Chargaff, 1951, 1971]. The rule was an important clue that J.Watson and F.Crick used to develop their model of the double helix structure of DNA. The second Chargaffs parity rule states that both A ≅ T and G ≅ C are approximately valid in single stranded DNA for long nucleotide sequences. Many works of different authors are devoted to confirmations and discussions of this second Chargaffs rule [Albrecht‐Buehler, 2006, 2007; Baisnee, Hampson, Baldi, 2002; Bell, Forsdyke, 1999; Chargaff, 1971, 1975; Dong, Cuticchia, 2001; Forsdyke, 1995, 2002, 2006; Forsdyke, Bell, 2004; Mitchell, Bridge, 2006; Okamura, Wei, Scherer, 2007; Prabhu, 1993; Rapoport, Trifonov, 2012; Sueoka, 1995; Yamagishi, Herai, 2011]. Originally, CSPR is meant to be valid only to mononucleotide frequencies that is quantities of monoplets in single stranded DNA. “But, it occurs that oligonucleotide frequencies follow a generalized Chargaff’s second parity rule GCSPR where the frequency of an oligonucleotide is approximately equal to its complement reverse oligonucleotide frequency [Prahbu, 1993]. This is known in the literature as the Symmetry Principle” [Yamagishi, Herai, 2011, p. 2]. The work [Prahbu, 1993] shows the implementation of the Symmetry Principle in long DNA‐sequences for cases of complementary reverse n‐plets with n = 2, 3, 4, 5 at least. In literature, a few synonimes of the term n‐ plets are used: n‐tuples, n‐words or n‐mers. These parity rules, including generalized Chargaff’s second parity rule for n‐plets in long nucleotide sequences, concerns the equality of frequencies of two separate mononucleotides or two separate oligonucleotides, for example: the equality of frequencies of adenine and thymine; the equality of frequences of the doublet CA and its complement‐reverse doublet TG; the equality of the triplets CAT and its complement‐reverse triplet ATG, etc. By contrast to this, to study hidden symmetries in long sequences of oligonucleotides of single stranded DNA, we apply a comparative analysis of equalities not for frequencies of separate oligonucleotides but for aggregated frequencies or collective frequences of oligonucleotides from separate members of their certain tetra‐groups. Below we explain the notion of these tetra‐groups of oligonucleotides and represent new tetra‐group rules of long sequences of oligonucleotides for single stranded DNA.

2. Tetra‐groups of oligonucleotides and probabilities of tetra‐group members