Abstract Algorithm Algorithm for Z

Theorem 7.3 collision Let e be a collision-free k-cycle in K j − 1 homologous to d. Then, the index of the youngest positive simplex in e is i = yd. Proof Let σ g be the youngest positive simplex in e and f be the sum of the basis cycles, homologous to d. By definition, f ’s youngest positive simplex is σ i , where i = yd. This implies that there are no cycles homologous to d in K i − 1 or earlier complexes; therefore g ≥ i. We show g ≤ i by contradiction. If g i, then e = f + c, where c bounds in K j − 1 . σ g ∈ f implies σ g ∈ c, and as σ g is the youngest in e, it is also the youngest in c. By assumption, T [g] is unoccupied as e is collision-free. In other words, the cycle created by σ g is still a nonbounding cycle in K j − 1 . Hence this cycle cannot be c. Also, the cycle cannot belong to c’s homology class at the time c becomes a boundary. It follows that the negative k + 1-simplex that converts c into a boundary pairs with a positive k-simplex in c that is younger than σ g , a contradiction. Hence g = i. The cycle search continues until it finds a collision-free cycle e homologous to d, and the collision theorem implies that e has the correct youngest positive simplex. This proves the correctness of the cycle search, and we may now substitute i = Y OUNGEST σ j for line in function P AIR -S IMPLICES .

7.2.3 Analysis

Let us now examine the running time of the cycle search algorithm. Let d = ∂ k +1 σ j and let σ i be the youngest positive k-simplex in Γd. The persistence of the cycle created by σ i and destroyed by σ j is p i = j − i − 1. The search for σ i proceeds from right to left starting at T [ j] and ending at T [i]. The number of collisions is at most the number of positive k-simplices strictly between σ i and σ j , which is less than p i . A collision happens at T [g] only if σ g already forms a pair, which implies its k-interval [g, h is contained inside [i, j. We use the nesting property to prove by induction that the k-cycle defined by Λ i is the sum of fewer than p i boundaries of k +1-simplices. Hence, Λ i contains fewer than k + 2p i k -simplices, and similarly Λ g contains fewer than k + 2p g k + 2p i k -simplices. A collision requires adding the two lists and finding the youngest in the new list. We do this by merging, which keeps the lists sorted by age. A single collision takes time at most Op i , and the entire search for σ i takes time at most Op 2 i . The total algorithm runs in time at most O∑ p 2 i , which is at most Om 3 . As we will see in Chapter 12, the algorithm is quite fast in practice, as both the average number of collisions and the average length of the simplex lists are small constants. The running time of cycle search can be improved to almost constant for dimensions k = 0 and k = 2 using a union-find data structure representing a system of disjoint sets and supporting union and find operations Cormen et al., 1994. For k = 0, each set is the vertex set of a connected component. Each set has exactly one yet unpaired vertex, namely the oldest one in the component. We modify standard union-find implementations in such a way that this vertex represents the set. Given a vertex, the find operation returns the representa- tive of the set that contains this vertex. Given an edge whose endpoints lie in different sets, the union operation merges the two sets into one. At the same time, it pairs the edge with the younger of the two representatives and retains the older one as the representative of the merged set. In this modified algorithm, a cycle search is replaced by two find operations possibly followed by a union operation. If we use union by rank and path compression for find, the amortized time per operation is OA − 1 m, where A − 1 m is the notoriously slowly growing inverse of the Ackermann function Cormen et al., 1994. We may use symmetry to accelerate the cycle search for 2-cycles using the union-find data structure for a system of sets of tetrahedra Delfinado and Edelsbrunner, 1995. We cannot achieve the same acceleration for 1-cycles using this method, however, as there can be multiple unpaired positive edges at any time. The additional complication seems to require the more cautious and therefore slower algorithm described above.

7.2.4 Canonization

The persistence algorithm halts when it finds the matching positive simplex σ i for a negative simplex σ j , often generating a cycle z with several positive simplices. We have shown that even though this cycle is not canonical, the algorithm computes the correct persistence pairs. In order to compute linking numbers, however, we need to convert z into a canonical cycle. We do so by eliminating all positive simplices in z except for σ i . We call this process can- onization Edelsbrunner and Zomorodian, 2003. To canonize a cycle, we add cycles associated with unnecessary positive simplices to z successively, until z is composed of σ i and some negative simplices, as shown in Figure 7.9 for 1- cycles. Canonization amounts to replacing one homology basis element with a linear combination of other elements in order to reach the unique canoni- cal basis, defined in Section 6.3.3. A cycle undergoing canonization changes homology classes, but the rank of the basis never changes. For each canonical 1-cycle, we also need a spanning surface in order to com- pute linking numbers. Again, we may compute such “surfaces” for cycles of all dimensions by simply maintaining the spanning surfaces while computing σ i σ j Fig. 7.9. Canonization of 1-cycles. Starting from the boundary of the negative triangle σ j , the persistence algorithm finds a matching positive edge σ i by finding the dashed 1- cycle. We modify this 1-cycle further to find the solid canonical 1-cycle and a spanning surface. the cycles. For a 0-cycle, the spanning manifold is a connected path of edges. For a 2-cycle, the spanning manifold is the set of tetrahedra that fill the void. We generalize this concept by the following definition. Definition 7.1 spanning manifold A spanning manifold for a k-cycle is a set of simplices whose sum has the cycle as its boundary. Recall that, initially, a cycle representative is the boundary of a negative sim- plex σ j . We use σ j as the initial spanning manifold for z. Every time we add a cycle y to z in the persistence algorithm, we also add the surface y bounds to the z’s surface. We continue this process through canonization to produce both canonical cycles and their spanning manifolds. Here, we are using a cru- cial property of α-complex filtrations: The final complex is always the Delau- nay complex of the set of weighted points and does not contain any 1-cycles. Therefore, all 1-cycles are eventually turned to boundaries and have spanning manifolds.

7.3 Algorithm for Fields

In this section, we devise an algorithm for computing persistent homology over an arbitrary field Zomorodian and Carlsson, 2004. Given the theoretical de- velopment of Section 6.1.5, our approach is rather simple: We simplify the standard reduction algorithm using the properties of the persistence module. Our arguments give an algorithm for computing the P-intervals for a filtered complex directly over the field F, without the need for constructing the per- sistence module. The algorithm is, in fact, a generalized version of the cycle search algorithm shown in the previous section. C k B k−1 Z k−1 C k−1 δ k+1 δ k C k+1 Z k k B Z k+1 k+1 B Fig. 7.10. A chain complex with its internals: chain, cycle, and boundary groups, and their images under the boundary operators.

7.3.1 Reduction

The standard method for computing homology is the reduction algorithm. We describe this method for integer coefficients as it is the more familiar ring. The method extends to modules over arbitrary PIDs, however. Recall the chain complex and its related groups, as shown in Figure 7.10 for a complex in an arbitrary dimension. As C k is free, the oriented k-simplices form the standard basis for it. We represent the boundary operator ∂ k : C k → C k − 1 relative to the standard bases of the chain groups as an integer matrix M k with entries in {−1, 0, 1}. The matrix M k is called the standard matrix representation of ∂ k . It has m k columns and m k − 1 rows the number of k- and k − 1-simplices, respectively. The null-space of M k corresponds to Z k and its range-space to B k − 1 , as manifested in Figure 7.10. The reduction algorithm derives alternate bases for the chain groups, relative to which the matrix for ∂ k is diagonal. The algorithm utilizes the following elementary row operations on M k : 1. exchange row i and row j; 2. multiply row i by −1; 3. replace row i by row i + qrow j, where q is an integer and j = i. The algorithm also uses elementary column operations that are similarly de- fined. Each column row operation corresponds to a change in the basis for C k C k − 1 . For example, if e i and e j are the ith and jth basis elements for C k , respectively, a column operation of type 3 amounts to replacing e i with e i + qe j . A similar row operation on basis elements ˆe i and ˆe j for C k − 1 , how- ever, replaces ˆe j by ˆe j − q ˆe i . We shall make use of this fact in Section 7.3.3. The algorithm systematically modifies the bases of C k and C k − 1 using elemen-