DNA and Group theory - Part III

Here’s a question that sits underneath a lot of mathematical biology, usually unasked: why this mathematical object and not another? Why differential equations for population dynamics, why information theory for neural coding, why networks for protein interactions? The answer is rarely arbitrary. The best mathematical frameworks don’t get imposed on biological systems from outside. They get recognized, pulled out of structure that was already there, waiting for the right language.

Group theory and DNA is one of the cleaner examples of this recognition. I want to explain why.

A group, in the mathematical sense, is a set of transformations that can be composed, reversed, and that includes doing nothing at all. The integers under addition form a group. Rotations of a sphere form a group. What groups measure, in the deepest sense, is symmetry: the structure of what stays the same when something changes.

This turns out to be exactly the right question to ask about DNA.

Consider what every molecular biologist knows about the four nucleotides, A, T, G, C. They are not an unstructured alphabet. They come with relationships baked in.

Chargaff’s rules: in double-stranded DNA, A pairs with T and G pairs with C. The frequency of A equals the frequency of T; the frequency of G equals the frequency of C. This is not a coincidence of chemistry. It reflects a pairing symmetry, a map that sends each base to its complement:

A ↔ T, G ↔ C

Apply this map twice and you return to where you started. That is already a group element, an involution, a symmetry of order two.

Then there is the purine/pyrimidine distinction. Adenine and guanine are purines, double-ringed. Cytosine and thymine are pyrimidines, single-ringed. This gives us another partition, another symmetry:

{A, G} ↔ {C, T}

And finally, the mutation structure of DNA distinguishes transitions from transversions. Transitions swap bases within the same chemical class, A to G, C to T. Transversions swap across classes. This is a third axis of symmetry running through the four-letter alphabet.

Three involutions. Three Z₂ symmetries. Acting together on the set {A, T, G, C}, they generate a group of four elements: the identity, and three swaps. Mathematicians call this the Klein four-group, written V₄. It has a simple multiplication table, no element of order greater than two, and the property that every element is its own inverse.

V₄ did not need to be invented for DNA. It was already there, encoded in Chargaff’s rules, in the purine/pyrimidine split, in the transition/transversion distinction. These are biological facts that every textbook records. The group is just what you get when you ask what all three facts have in common, stated precisely.

Wigner wrote about the unreasonable effectiveness of mathematics, the strange fact that structures developed with no empirical application in mind turn out to describe reality with uncanny precision. Group theory was born in the early nineteenth century, in Galois’s work on polynomial equations. It later became the language of crystallography, then particle physics. The symmetry groups of elementary particles were not fitted to the data after the fact. They were found in the data, the way V₄ is found in DNA: as the precise name for structure that was already there.

The reason this keeps happening is that groups measure something real. They measure structure-preserving transformations, the changes a system can undergo while remaining, in some essential sense, itself. Whenever a system has that property, group theory is not an imposition. It is a description.

DNA has that property in abundance. Base pairing is preserved under complementation. Chemical class is preserved under transition. Regulatory function, in some cases, under some transformations, is preserved in ways we are only beginning to map precisely. The symmetries are biological facts. The group is the grammar that holds them together.

Once you have the group, you can ask questions you could not ask before.

You can decompose sequence space into orbits, sets of sequences related by group symmetry, and ask which orbits are over- or under-represented in functional regions. You can define a metric on the group that encodes biological costs, transitions cheaper than transversions, and ask how sequence variation is distributed with respect to that metric. You can write down coupling matrices that respect the group structure and use them to model long-range interactions in a way that is constrained by the biology, not just the data.

None of this requires inventing new mathematics. It requires recognizing that the mathematics already fits, that V₄ was always the right object, and that we had the evidence for it in Chargaff’s rules for seventy years before anyone thought to name it.

That is what I find compelling about group theory as a framework for DNA. Not that it is sophisticated. Not that it is fashionable. But that the symmetries were already there, and the group is just what it looks like when you take them seriously all at once.

Comments

Popular posts from this blog

Why Information is Logarithmic: Hartley’s 1928 Insight

An interview with a lawyer on Public Policy and Law

my family! Guest post by 7yo niece Part III