An efficient algorithm Strongly connected components

S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 103 can be linearized. If we want finer detail, we can look inside one of the nodes of this dag and examine the full-fledged strongly connected component within.

3.4.2 An efficient algorithm

The decomposition of a directed graph into its strongly connected components is very infor- mative and useful. It turns out, fortunately, that it can be found in linear time by making further use of depth-first search. The algorithm is based on some properties we have already seen but which we will now pinpoint more closely. Property 1 If the explore subroutine is started at node u , then it will terminate precisely when all nodes reachable from u have been visited. Therefore, if we call explore on a node that lies somewhere in a sink strongly connected component a strongly connected component that is a sink in the meta-graph, then we will retrieve exactly that component. Figure 3.9 has two sink strongly connected components. Starting explore at node K, for instance, will completely traverse the larger of them and then stop. This suggests a way of finding one strongly connected component, but still leaves open two major problems: A how do we find a node that we know for sure lies in a sink strongly con- nected component and B how do we continue once this first component has been discovered? Let’s start with problem A. There is not an easy, direct way to pick out a node that is guaranteed to lie in a sink strongly connected component. But there is a way to get a node in a source strongly connected component. Property 2 The node that receives the highest post number in a depth-first search must lie in a source strongly connected component. This follows from the following more general property. Property 3 If C and C ′ are strongly connected components, and there is an edge from a node in C to a node in C ′ , then the highest post number in C is bigger than the highest post number in C ′ . Proof. In proving Property 3, there are two cases to consider. If the depth-first search visits component C before component C ′ , then clearly all of C and C ′ will be traversed before the procedure gets stuck see Property 1. Therefore the first node visited in C will have a higher post number than any node of C ′ . On the other hand, if C ′ gets visited first, then the depth- first search will get stuck after seeing all of C ′ but before seeing any of C, in which case the property follows immediately. Property 3 can be restated as saying that the strongly connected components can be lin- earized by arranging them in decreasing order of their highest post numbers . This is a gen- eralization of our earlier algorithm for linearizing dags; in a dag, each node is a singleton strongly connected component. Property 2 helps us find a node in the source strongly connected component of G. How- ever, what we need is a node in the sink component. Our means seem to be the opposite of 104 Algorithms Figure 3.10 The reverse of the graph from Figure 3.9. A D E C F B H G K L J I A B,E C,F D J,K,L G,H,I our needs But consider the reverse graph G R , the same as G but with all edges reversed Figure 3.10. G R has exactly the same strongly connected components as G why?. So, if we do a depth-first search of G R , the node with the highest post number will come from a source strongly connected component in G R , which is to say a sink strongly connected component in G. We have solved problem A Onward to problem B. How do we continue after the first sink component is identified? The solution is also provided by Property 3. Once we have found the first strongly connected component and deleted it from the graph, the node with the highest post number among those remaining will belong to a sink strongly connected component of whatever remains of G. Therefore we can keep using the post numbering from our initial depth-first search on G R to successively output the second strongly connected component, the third strongly connected component, and so on. The resulting algorithm is this. 1. Run depth-first search on G R . 2. Run the undirected connected components algorithm from Section 3.2.3 on G, and dur- ing the depth-first search, process the vertices in decreasing order of their post numbers from step 1. This algorithm is linear-time, only the constant in the linear term is about twice that of straight depth-first search. Question: How does one construct an adjacency list represen- tation of G R in linear time? And how, in linear time, does one order the vertices of G by decreasing post values? S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 105 Let’s run this algorithm on the graph of Figure 3.9. If step 1 considers vertices in lex- icographic order, then the ordering it sets up for the second step namely, decreasing post numbers in the depth-first search of G R is: G, I, J, L, K, H, D, C, F, B, E, A. Then step 2 peels off components in the following sequence: {G, H, I, J, K, L}, {D}, {C, F }, {B, E}, {A}. Crawling fast All this assumes that the graph is neatly given to us, with vertices numbered 1 to n and edges tucked in adjacency lists. The realities of the World Wide Web are very different. The nodes of the Web graph are not known in advance, and they have to be discovered one by one during the process of search. And, of course, recursion is out of the question. Still, crawling the Web is done by algorithms very similar to depth-first search. An explicit stack is maintained, containing all nodes that have been discovered as endpoints of hyperlinks but not yet explored. In fact, this “stack” is not exactly a last-in, first-out list. It gives highest priority not to the nodes that were inserted most recently nor the ones that were inserted earliest, that would be a breadth-first search, see Chapter 4, but to the ones that look most “interesting”—a heuristic criterion whose purpose is to keep the stack from overflowing and, in the worst case, to leave unexplored only nodes that are very unlikely to lead to vast new expanses. In fact, crawling is typically done by many computers running explore simultaneously: each one takes the next node to be explored from the top of the stack, downloads the http file the kind of Web files that point to each other, and scans it for hyperlinks. But when a new http document is found at the end of a hyperlink, no recursive calls are made: instead, the new vertex is inserted in the central stack. But one question remains: When we see a “new” document, how do we know that it is indeed new, that we have not seen it before in our crawl? And how do we give it a name, so it can be inserted in the stack and recorded as “already seen”? The answer is by hashing. Incidentally, researchers have run the strongly connected components algorithm on the Web and have discovered some very interesting structure. 106 Algorithms Exercises 3.1. Perform a depth-first search on the following graph; whenever there’s a choice of vertices, pick the one that is alphabetically first. Classify each edge as a tree edge or back edge, and give the pre and post number of each vertex. A B C D E F G H I 3.2. Perform depth-first search on each of the following graphs; whenever there’s a choice of vertices, pick the one that is alphabetically first. Classify each edge as a tree edge, forward edge, back edge, or cross edge, and give the pre and post number of each vertex. a F A C B E D G H b F C B A H G E D 3.3. Run the DFS-based topological ordering algorithm on the following graph. Whenever you have a choice of vertices to explore, always pick the one that is alphabetically first. A C E D F B G H a Indicate the pre and post numbers of the nodes. b What are the sources and sinks of the graph? c What topological ordering is found by the algorithm? d How many topological orderings does this graph have? 3.4. Run the strongly connected components algorithm on the following directed graphs G. When doing DFS on G R : whenever there is a choice of vertices to explore, always pick the one that is alphabetically first. S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 107 i A B E G H I C D F J ii A B C D E F G H I In each case answer the following questions. a In what order are the strongly connected components SCCs found? b Which are source SCCs and which are sink SCCs? c Draw the “metagraph” each meta-node is an SCC of G. d What is the minimum number of edges you must add to this graph to make it strongly connected? 3.5. The reverse of a directed graph G = V, E is another directed graph G R = V, E R on the same vertex set, but with all edges reversed; that is, E R = {v, u : u, v ∈ E}. Give a linear-time algorithm for computing the reverse of a graph in adjacency list format. 3.6. In an undirected graph, the degree du of a vertex u is the number of neighbors u has, or equiv- alently, the number of edges incident upon it. In a directed graph, we distinguish between the indegree d in u, which is the number of edges into u, and the outdegree d out u, the number of edges leaving u. a Show that in an undirected graph, P u∈V du = 2 |E|. b Use part a to show that in an undirected graph, there must be an even number of vertices whose degree is odd. c Does a similar statement hold for the number of vertices with odd indegree in a directed graph? 3.7. A bipartite graph is a graph G = V, E whose vertices can be partitioned into two sets V = V 1 ∪V 2 and V 1 ∩ V 2 = ∅ such that there are no edges between vertices in the same set for instance, if u, v ∈ V 1 , then there is no edge between u and v. a Give a linear-time algorithm to determine whether an undirected graph is bipartite. b There are many other ways to formulate this property. For instance, an undirected graph is bipartite if and only if it can be colored with just two colors. Prove the following formulation: an undirected graph is bipartite if and only if it contains no cycles of odd length. c At most how many colors are needed to color in an undirected graph with exactly one odd- length cycle? 108 Algorithms 3.8. Pouring water. We have three containers whose sizes are 10 pints, 7 pints, and 4 pints, re- spectively. The 7-pint and 4-pint containers start out full of water, but the 10-pint container is initially empty. We are allowed one type of operation: pouring the contents of one container into another, stopping only when the source container is empty or the destination container is full. We want to know if there is a sequence of pourings that leaves exactly 2 pints in the 7- or 4-pint container. a Model this as a graph problem: give a precise definition of the graph involved and state the specific question about this graph that needs to be answered. b What algorithm should be applied to solve the problem? c Find the answer by applying the algorithm. 3.9. For each node u in an undirected graph, let twodegree [u] be the sum of the degrees of u’s neigh- bors. Show how to compute the entire array of twodegree [ ·] values in linear time, given a graph in adjacency list format. 3.10. Rewrite the explore procedure Figure 3.3 so that it is non-recursive that is, explicitly use a stack. The calls to previsit and postvisit should be positioned so that they have the same effect as in the recursive procedure. 3.11. Design a linear-time algorithm which, given an undirected graph G and a particular edge e in it, determines whether G has a cycle containing e. 3.12. Either prove or give a counterexample: if {u, v} is an edge in an undirected graph, and during depth-first search post u post v, then v is an ancestor of u in the DFS tree. 3.13. Undirected vs. directed connectivity. a Prove that in any connected undirected graph G = V, E there is a vertex v ∈ V whose removal leaves G connected. Hint: Consider the DFS search tree for G. b Give an example of a strongly connected directed graph G = V, E such that, for every v ∈ V , removing v from G leaves a directed graph that is not strongly connected. c In an undirected graph with 2 connected components it is always possible to make the graph connected by adding only one edge. Give an example of a directed graph with two strongly connected components such that no addition of one edge can make the graph strongly con- nected. 3.14. The chapter suggests an alternative algorithm for linearization topological sorting, which re- peatedly removes source nodes from the graph page 101. Show that this algorithm can be implemented in linear time. 3.15. The police department in the city of Computopia has made all streets one-way. The mayor con- tends that there is still a way to drive legally from any intersection in the city to any other intersection, but the opposition is not convinced. A computer program is needed to determine whether the mayor is right. However, the city elections are coming up soon, and there is just enough time to run a linear-time algorithm. a Formulate this problem graph-theoretically, and explain why it can indeed be solved in linear time. S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 109 b Suppose it now turns out that the mayor’s original claim is false. She next claims something weaker: if you start driving from town hall, navigating one-way streets, then no matter where you reach, there is always a way to drive legally back to the town hall. Formulate this weaker property as a graph-theoretic problem, and carefully show how it too can be checked in linear time. 3.16. Suppose a CS curriculum consists of n courses, all of them mandatory. The prerequisite graph G has a node for each course, and an edge from course v to course w if and only if v is a prerequisite for w. Find an algorithm that works directly with this graph representation, and computes the minimum number of semesters necessary to complete the curriculum assume that a student can take any number of courses in one semester. The running time of your algorithm should be linear. 3.17. Infinite paths. Let G = V, E be a directed graph with a designated “start vertex” s ∈ V , a set V G ⊆ V of “good” vertices, and a set V B ⊆ V of “bad” vertices. An infinite trace p of G is an infinite sequence v v 1 v 2 · · · of vertices v i ∈ V such that 1 v = s, and 2 for all i ≥ 0, v i , v i+1 ∈ E. That is, p is an infinite path in G starting at vertex s. Since the set V of vertices is finite, every infinite trace of G must visit some vertices infinitely often. a If p is an infinite trace, let Infp ⊆ V be the set of vertices that occur infinitely often in p. Show that Infp is a subset of a strongly connected component of G. b Describe an algorithm that determines if G has an infinite trace. c Describe an algorithm that determines if G has an infinite trace that visits some good vertex in V G infinitely often. d Describe an algorithm that determines if G has an infinite trace that visits some good vertex in V G infinitely often, but visits no bad vertex in V B infinitely often. 3.18. You are given a binary tree T = V, E in adjacency list format, along with a designated root node r ∈ V . Recall that u is said to be an ancestor of v in the rooted tree, if the path from r to v in T passes through u. You wish to preprocess the tree so that queries of the form “is u an ancestor of v?” can be answered in constant time. The preprocessing itself should take linear time. How can this be done? 3.19. As in the previous problem, you are given a binary tree T = V, E with designated root node. In addition, there is an array x[·] with a value for each node in V . Define a new array z[·] as follows: for each u ∈ V , z[u] = the maximum of the x-values associated with u’s descendants. Give a linear-time algorithm which calculates the entire z-array. 3.20. You are given a tree T = V, E along with a designated root node r ∈ V . The parent of any node v 6= r, denoted pv, is defined to be the node adjacent to v in the path from r to v. By convention, pr = r. For k 1, define p k v = p k−1 pv and p 1 v = pv so p k v is the kth ancestor of v. Each vertex v of the tree has an associated non-negative integer label lv. Give a linear-time algorithm to update the labels of all the vertices in T according to the following rule: l new v = lp lv v. 3.21. Give a linear-time algorithm to find an odd-length cycle in a directed graph. Hint: First solve this problem under the assumption that the graph is strongly connected. 110 Algorithms 3.22. Give an efficient algorithm which takes as input a directed graph G = V, E, and determines whether or not there is a vertex s ∈ V from which all other vertices are reachable. 3.23. Give an efficient algorithm that takes as input a directed acyclic graph G = V, E, and two vertices s, t ∈ V , and outputs the number of different directed paths from s to t in G. 3.24. Give a linear-time algorithm for the following task. Input: A directed acyclic graph G Question: Does G contain a directed path that touches every vertex exactly once? 3.25. You are given a directed graph in which each node u ∈ V has an associated price p u which is a positive integer. Define the array cost as follows: for each u ∈ V , cost [u] = price of the cheapest node reachable from u including u itself. For instance, in the graph below with prices shown for each vertex, the cost values of the nodes A, B, C, D, E, F are 2, 1, 4, 1, 4, 5, respectively. A B C D E F 1 5 4 6 2 3 Your goal is to design an algorithm that fills in the entire cost array i.e., for all vertices. a Give a linear-time algorithm that works for directed acyclic graphs. Hint: Handle the vertices in a particular order. b Extend this to a linear-time algorithm that works for all directed graphs. Hint: Recall the “two-tiered” structure of directed graphs. 3.26. An Eulerian tour in an undirected graph is a cycle that is allowed to pass through each vertex multiple times, but must use each edge exactly once. This simple concept was used by Euler in 1736 to solve the famous Konigsberg bridge problem, which launched the field of graph theory. The city of Konigsberg now called Kaliningrad, in western Russia is the meeting point of two rivers with a small island in the middle. There are seven bridges across the rivers, and a popular recreational question of the time was to determine whether it is possible to perform a tour in which each bridge is crossed exactly once. Euler formulated the relevant information as a graph with four nodes denoting land masses and seven edges denoting bridges, as shown here. S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 111 Southern bank Northern bank Small island Big island Notice an unusual feature of this problem: multiple edges between certain pairs of nodes. a Show that an undirected graph has an Eulerian tour if and only if all its vertices have even degree. Conclude that there is no Eulerian tour of the Konigsberg bridges. b An Eulerian path is a path which uses each edge exactly once. Can you give a similar if-and-only-if characterization of which undirected graphs have Eulerian paths? c Can you give an analog of part a for directed graphs? 3.27. Two paths in a graph are called edge-disjoint if they have no edges in common. Show that in any undirected graph, it is possible to pair up the vertices of odd degree and find paths between each such pair so that all these paths are edge-disjoint. 3.28. In the 2SAT problem, you are given a set of clauses, where each clause is the disjunction OR of two literals a literal is a Boolean variable or the negation of a Boolean variable. You are looking for a way to assign a value true or false to each of the variables so that all clauses are satisfied – that is, there is at least one true literal in each clause. For example, here’s an instance of 2SAT: x 1 ∨ x 2 ∧ x 1 ∨ x 3 ∧ x 1 ∨ x 2 ∧ x 3 ∨ x 4 ∧ x 1 ∨ x 4 . This instance has a satisfying assignment: set x 1 , x 2 , x 3 , and x 4 to true , false , false , and true , respectively. a Are there other satisfying truth assignments of this 2SAT formula? If so, find them all. b Give an instance of 2SAT with four variables, and with no satisfying assignment. The purpose of this problem is to lead you to a way of solving 2SAT efficiently by reducing it to the problem of finding the strongly connected components of a directed graph. Given an instance I of 2SAT with n variables and m clauses, construct a directed graph G I = V, E as follows. • G I has 2n nodes, one for each variable and its negation. • G I has 2m edges: for each clause α ∨ β of I where α, β are literals, G I has an edge from from the negation of α to β, and one from the negation of β to α. Note that the clause α ∨ β is equivalent to either of the implications α ⇒ β or β ⇒ α. In this sense, G I records all implications in I. c Carry out this construction for the instance of 2SAT given above, and for the instance you constructed in b. 112 Algorithms d Show that if G I has a strongly connected component containing both x and x for some variable x, then I has no satisfying assignment. e Now show the converse of d: namely, that if none of G I ’s strongly connected components contain both a literal and its negation, then the instance I must be satisfiable. Hint: As- sign values to the variables as follows: repeatedly pick a sink strongly connected component of G I . Assign value true to all literals in the sink, assign false to their negations, and delete all of these. Show that this ends up discovering a satisfying assignment. f Conclude that there is a linear-time algorithm for solving 2SAT. 3.29. Let S be a finite set. A binary relation on S is simply a collection R of ordered pairs x, y ∈ S × S. For instance, S might be a set of people, and each such pair x, y ∈ R might mean “x knows y.” An equivalence relation is a binary relation which satisfies three properties: • Reflexivity: x, x ∈ R for all x ∈ S • Symmetry: if x, y ∈ R then y, x ∈ R • Transitivity: if x, y ∈ R and y, z ∈ R then x, z ∈ R For instance, the binary relation “has the same birthday as” is an equivalence relation, whereas “is the father of” is not, since it violates all three properties. Show that an equivalence relation partitions set S into disjoint groups S 1 , S 2 , . . . , S k in other words, S = S 1 ∪ S 2 ∪ · · · ∪ S k and S i ∩ S j = ∅ for all i 6= j such that: • Any two members of a group are related, that is, x, y ∈ R for any x, y ∈ S i , for any i. • Members of different groups are not related, that is, for all i 6= j, for all x ∈ S i and y ∈ S j , we have x, y 6∈ R. Hint: Represent an equivalence relation by an undirected graph. 3.30. On page 102, we defined the binary relation “connected” on the set of vertices of a directed graph. Show that this is an equivalence relation see Exercise 3.29, and conclude that it partitions the vertices into disjoint strongly connected components. 3.31. Biconnected components Let G = V, E be an undirected graph. For any two edges e, e ′ ∈ E, we’ll say e ∼ e ′ if either e = e ′ or there is a simple cycle containing both e and e ′ . a Show that ∼ is an equivalence relation recall Exercise 3.29 on the edges. The equivalence classes into which this relation partitions the edges are called the biconnected components of G. A bridge is an edge which is in a biconnected component all by itself. A separating vertex is a vertex whose removal disconnects the graph. b Partition the edges of the graph below into biconnected components, and identify the bridges and separating vertices. S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 113 C D A B E F G O N M L K J I H Not only do biconnected components partition the edges of the graph, they also almost partition the vertices in the following sense. c Associate with each biconnected component all the vertices that are endpoints of its edges. Show that the vertices corresponding to two different biconnected components are either disjoint or intersect in a single separating vertex. d Collapse each biconnected component into a single meta-node, and retain individual nodes for each separating vertex. So there are edges between each component-node and its sep- arating vertices. Show that the resulting graph is a tree. DFS can be used to identify the biconnected components, bridges, and separating vertices of a graph in linear time. e Show that the root of the DFS tree is a separating vertex if and only if it has more than one child in the tree. f Show that a non-root vertex v of the DFS tree is a separating vertex if and only if it has a child v ′ none of whose descendants including itself has a backedge to a proper ancestor of v. g For each vertex u define: low u = min pre u pre w where v, w is a backedge for some descendant v of u Show that the entire array of low values can be computed in linear time. h Show how to compute all separating vertices, bridges, and biconnected components of a graph in linear time. Hint: Use low to identify separating vertices, and run another DFS with an extra stack of edges to remove biconnected components one at a time. 114 Algorithms Chapter 4 Paths in graphs

4.1 Distances