a Topology map
5000 10000
15000 20000
25000 30000
35000 40000
45000 50000
p 10000
20000 30000
40000 50000
60000 70000
l 1
2 3
4 5
6 7
8 9
10 log
2
β
1 l,p
+1
b Graph of log
2
β
l ,p
1
+ 1
Fig. 13.5. The persistent Betti numbers of 1hck.
2 4
6 8
10 12
14
5000 10000
15000 20000
25000 30000
35000 log
2
Number of Cycles + 1 Persistence
a BOG
2 4
6 8
10 12
14
5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 log
2
Number of Cycles + 1 Persistence
b 1hck
Fig. 13.6. Persistence histograms. BOG ’s histogram a shows some grouping, but 1hck ’s b does not.
13.1.2 Knotting
We also wish to detect whether proteins are knotted or have linking in their structures. I have already described algorithms for detecting linking in Chap-
ter 10. The linking number algorithms give us a signature function for a pro- tein. We may also look for alternate signature functions for describing the
topology of a protein. The approach here is to exploit the fast combinatorial representation to compute other knot and link invariants. Future directions in-
clude computing polynomial invariants, such as the Alexander polynomial for detecting knots Adams, 1994.
13.1.3 Structure Determination
One method used for determining the architecture of a protein is X-Ray crys- tallography
Rhodes, 2000. After forming a high-quality crystal of a protein, we analyze the diffraction pattern produced by X-irradiation to generate an
electron density map . The sequence of amino acids in the protein must be
known independently. We then fit the atoms of the residues into the computed electron density map via a series of refinements. The result is a set of Cartesian
coordinates for every non-hydrogen atom in the molecule. Usually, we use these coordinates, augmented with van der Waals radii, to
produce filtrations for proteins, the input to the algorithms in this book. We wish to use persistence also as a tool for refining the resolved protein. We
guide modifications to the structure of the protein and the radii of the atoms by using persistent complexes. We then produce a synthetic electronic density
map for the new coordinates and radii, and compare it to the original density map.
We may also construct three-dimensional MS complexes of the electron- density data for denoising using persistence. I will discuss general denoising
of density functions in Section 13.3.
13.2 Hierarchical Clustering
In Chapter 2, we looked at α-shapes as a method for describing the connectiv-
ity of a space. As we increase α, the centers of the balls in our data sets are
connected via edges and triangles. We may view the connections as a hierar- chical clustering mechanism. Persistence adds another dimension to
α-shapes, giving us a two-parameter family of shapes for describing the clustering of
point sets. Edelsbrunner and Mücke 1994 first noted the possibility of using
α-shapes as a method for studying the distribution of galaxies in our universe. Dykster-
house 1992 took initial steps in this direction. Persistence gives us additional tools for examining the clustering of galaxies in the universe. Figure 13.7
displays a simulated data set due to Marc Dyksterhouse. Each of the 1,717 vertices represents a galaxy and is a component 0-cycle of the complex. The
figure also displays the manifolds of the 0-cycles: the path through which galaxies will be connected in the future. We may use this information to con-
struct a hierarchical description of the galaxies. In addition, we can examine the persistent topological features of the filtration of the universe. Voids, for
example, correspond to empty areas of space.
Another instance of using persistence for hierarchical clustering is to clas-
Fig. 13.7. A simulated universe, its 0-cycles, and manifolds.
1 4
16 64
256 1024
4096
10000 20000
30000 40000
50000 60000
70000 80000
90000 100000
β
l, p
l
Fig. 13.8. Graph of β
l ,p
projected on the l, β
plane for new data set 1mct: Trypsin complexed with inhibitor from bitter.
sify proteins according to their hydrophobic surfaces. Here, we sample hy- drophobic points along the surface of a protein. We then compute an
α- complex filtration from these points and examine the persistent components.
Figure 13.8 shows the graph of the β
for this data set. The graph is projected