Topological Feature Detection Computational Structural Biology

a b Fig. 13.10. The dashed medial axis of the solid polygon a is ill-conditioned as a small perturbation changes the resulting axis dramatically b. scriptor of shapes for pattern recognition, solid modeling, mesh generation, and pocket machining. This descriptor, however, is ill-conditioned, as a small perturbation in the data changes the description radically. I illustrate the sen- sitivity of the medial axis with an example in Figure 13.10. By restating the problem in terms of persistence, we may be able to denoise the data and, in turn, simplify the medial axis, obtaining a robust description of the data. We can extend the definition of the medial axis to n-dimensional manifolds by using n-dimensional spheres, instead of circles. The definition remains sensitive to noise in all dimensions and therefore still requires a method for simplification.

13.4 Surface Reconstruction

Another direction for future work is using persistence for surface reconstruc- tion. I introduced this problem as an example of a topological question in Chapter 1. We may employ the control persistence gives us over the topology of a space to reconstruct surfaces from sampled points. Figure 13.11 shows a single-click reconstruction of the bunny surface. Note that I selected a complex with a tunnel. The bunny was not sampled on its base across the two black felts it rests on, as a laser range-finder scanner was used for acquiring the samples. A good reconstruction, therefore, has two holes or a single tunnel. Such knowledge, however, is not always available. I believe that a successful reconstruction algorithm must be interactive, it- erative, and adaptive. Abstractly, we wish to identify a coordinate l, p such that the complex K l ,p contains a reconstruction of the point set. We may enrich the solution space by computing radii for the points. For example, we can es- timate the local curvature at each point, assigning the inverse curvature as the radius of the point. We then recompute the filtration with the new radii. Sta- Fig. 13.11. Surface reconstruction with CView. The selected coordinates on the topol- ogy map a give a good approximation b. tistical analysis of persistence values can give us candidate persistence cutoffs. We use these values to simplify the complex in each dimension independently. Persistence may also guide modifications to the computed radii, giving us a multi-stage refinement algorithm.

13.5 Shape Description

In Section 12.3, we saw that persistence intervals could be used as a compact and general shape descriptor for a space. We are motivated, therefore, to ex- plore shape classification with persistent homology. Homology used in this manner, however, is a crude invariant. It cannot distinguish between circles and ovals, between circles and rectangles, or even between Euclidean spaces of different dimensions. Further, it cannot identify singular points, such as cor- ners, edges, or cone points, as their neighborhoods are homeomorphic to each other. A solution to this apparent weakness of homology is to apply it not to a space X itself, but rather to spaces constructed out of X using tangential infor- mation about X as a subset of R n Carlsson et al., 2004. For example, the line in Figure 13.12 has a tangent complex with two components. The “V” shape, on the other hand, has a singular point, resulting in a tangent complex with four components. In practice, we wish to obtain information about a shape when we only have a finite set of samples from that shape. We are faced, therefore, with the additional difficulty of recovering the underlying shape topology, as well as approximating the tangential spaces that we define Colllins et al., 2004. x L −η +η x ξ a Line O y x b V Fig. 13.12. The line a has a tangent complex with two components. The “V” space b has a tangent complex with four components.

13.6 IO Efficient Algorithms

Most of the applications I have described so far in this chapter are only practi- cal if the algorithms can process massive amounts of data. In recent years, advances in computer technology and acquisition devices have made high- resolution data available to the scientific community. For instance, the Digital Michelangelo Project at Stanford University sampled the statue David using a 0.25-millimeter laser scanner. The reconstructed surface consists of more than two billion triangles Levoy et al., 2000. Similarly, detailed terrain data for much of the earth’s surface is publicly available at a 10-meter resolution from the U.S. Geological Survey. At this scale, data sets for even small portions of the planet will be at least hundreds of megabytes in size. Internal memory algorithms are often unable to handle such massive data, even when executing on fast machines with large memories. It becomes critical, therefore, to design IO efficient external memory algorithms to analyze massive data Arge et al., 2000. Bibliography C. C. Adams. The Knot Book: An Elementary Introduction to the Mathematical Theory of Knots . W. H. Freeman and Company, New York, 1994. S. I. Adyan. The algorithmic unsolvability of problems concerning recognition of certain properties of groups. In Doklady Academy Nauk SSSR, volume 103, pages 533–535. Soviet Academy of Sciences, 1955. S. I. Adyan and G. S. Makanin. Investigations on algorithmic questions of algebra. In Proceedings of the Steklov Institute of Mathematics , volume 3, pages 209–219, 1986. N. Amenta and M. Bern. Surface reconstruction by Voronoi filtering. Discrete Comput. Geom. , 22:481–504, 1999. L. Arge, L. Toma, and J. S. Vitter. IO-efficient algorithms for problems on grid-based terrains. In Proc. Workshop Algor. Engin. Exper., 2000. C. L. Bajaj, V. Pascucci, and D. R. Schikore. Visualization of scalar topology for structural enhancement. In Proc. 9th Ann. IEEE Conf. Visualization, pages 18–23, 1998. T. F. Banchoff. Critical points and curvature for embedded polyhedral surfaces. Am. Math. Monthly , 77:475–485, 1970. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. Nucleic Acids Research, 28: 235–242, 2000. R. L. Bishop and S. I. Goldberg. Tensor Analysis on Manifolds. Dover Publications, Inc., New York, 1980. R. A. Bissell, E. Córdova, A. E. Kaifer, and J. F. Stoddart. A checmically and electrochemically switchable molecular shuttle. Nature, 369:133–137, 1994. A. Blum. A transformation for extracting new descriptors of shape. In Proc. Symp. Models for Perception of Speech and Visual Form , pages 362–380, 1967. W. M. Boothby. An Introduction to Differentiable Manifolds and Riemannian Geometry. Academic Press, San Diego, CA, second edition, 1986. J. W. Bruce and P. J. Giblin. Curves and Singularities. Cambridge University Press, New York, second edition, 1992. G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas. Persistence barcodes for shapes, 2004. To be published in Proc. Symp. Geom. Process. H. Carr, J. Snoeyink, and U. Axen. Computing contour trees in all dimensions. In Proc. 11th Ann. Sympos. Discrete Alg. , pages 918–926, 2000. CATH. Protein structure classification, 2003. http:www.biochem.ucl.ac.ukbsmcath. C. P. Collier, E. W. Wong, Belohradský, F. M. Raymo, J. F. Stoddart, P. J. Kuekes, R. S. 235