Show that for any problem Π in NP, there is an algorithm which solves Π in time O2 Show that if P = NP then the RSA cryptosystem Section 1.4.2 can be broken in polynomial

S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 281 In this case, ingredients 2 and 3 go together pretty well whereas 1 and 5 clash badly. Notice that this matrix is necessarily symmetric; and that the diagonal entries are always 0.0. Any set of ingredients incurs a penalty which is the sum of all discord values between pairs of ingredients. For instance, the set of ingredients {1, 3, 5} incurs a penalty of 0.2 + 1.0 + 0.5 = 1.7. We want this penalty to be small. E XPERIMENTAL CUISINE Input: n, the number of ingredients to choose from; D, the n ×n “discord” matrix; some number p ≥ 0 Output: The maximum number of ingredients we can choose with penalty ≤ p. Show that if EXPERIMENTAL CUISINE is solvable in polynomial time, then so is 3 SAT .

8.17. Show that for any problem Π in NP, there is an algorithm which solves Π in time O2

pn , where n is the size of the input instance and pn is a polynomial which may depend on Π.

8.18. Show that if P = NP then the RSA cryptosystem Section 1.4.2 can be broken in polynomial

time. 8.19. A kite is a graph on an even number of vertices, say 2n, in which n of the vertices form a clique and the remaining n vertices are connected in a “tail” that consists of a path joined to one of the vertices of the clique. Given a graph and a goal g, the KITE problem asks for a subgraph which is a kite and which contains 2g nodes. Prove that KITE is NP-complete. 8.20. In an undirected graph G = V, E, we say D ⊆ V is a dominating set if every v ∈ V is either in D or adjacent to at least one member of D. In the DOMINATING SET problem, the input is a graph and a budget b, and the aim is to find a dominating set in the graph of size at most b, if one exists. Prove that this problem is NP-complete. 8.21. Sequencing by hybridization. One experimental procedure for identifying a new DNA sequence repeatedly probes it to determine which k-mers substrings of length k it contains. Based on these, the full sequence must then be reconstructed. Let’s now formulate this as a combinatorial problem. For any string x the DNA sequence, let Γx denote the multiset of all of its k-mers. In particular, Γx contains exactly |x| − k + 1 elements. The reconstruction problem is now easy to state: given a multiset of k-length strings, find a string x such that Γx is exactly this multiset. a Show that the reconstruction problem reduces to R UDRATA PATH . Hint: Construct a di- rected graph with one node for each k-mer, and with an edge from a to b if the last k − 1 characters of a match the first k − 1 characters of b. b But in fact, there is much better news. Show that the same problem also reduces to E ULER PATH . Hint: This time, use one directed edge for each k-mer. 8.22. In task scheduling, it is common to use a graph representation with a node for each task and a directed edge from task i to task j if i is a precondition for j. This directed graph depicts the precedence constraints in the scheduling problem. Clearly, a schedule is possible if and only if the graph is acyclic; if it isn’t, we’d like to identify the smallest number of constraints that must be dropped so as to make it acyclic. Given a directed graph G = V, E, a subset E ′ ⊆ E is called a feedback arc set if the removal of edges E ′ renders G acyclic. 282 Algorithms F EEDBACK ARC SET FAS : Given a directed graph G = V, E and a budget b, find a feedback arc set of ≤ b edges, if one exists. a Show that FAS is in NP. F AS can be shown to be NP-complete by a reduction from VERTEX COVER . Given an instance G, b of VERTEX COVER , where G is an undirected graph and we want a vertex cover of size ≤ b, we construct a instance G ′ , b of FAS as follows. If G = V, E has n vertices v 1 , . . . , v n , then make G ′ = V ′ , E ′ a directed graph with 2n vertices w 1 , w ′ 1 , . . . , w n , w ′ n , and n + 2|E| directed edges: • w i , w ′ i for all i = 1, 2, . . . , n. • w ′ i , w j and w ′ j , w i for every v i , v j ∈ E. b Show that if G contains a vertex cover of size b, then G ′ contains a feedback arc set of size b. c Show that if G ′ contains a feedback arc set of size b, then G contains a vertex cover of size at most b. Hint: given a feedback arc set of size b in G ′ , you may need to first modify it slightly to obtain another one which is of a more convenient form, but is of the same size or smaller. Then, argue that G must contain a vertex cover of the same size as the modified feedback arc set. 8.23. In the NODE - DISJOINT PATHS problem, the input is an undirected graph in which some vertices have been specially marked: a certain number of “sources” s 1 , s 2 , . . . s k and an equal number of “destinations” t 1 , t 2 , . . . t k . The goal is to find k node-disjoint paths that is, paths which have no nodes in common where the ith path goes from s i to t i . Show that this problem is NP-complete. Here is a sequence of progressively stronger hints. a Reduce from 3 SAT . b For a 3 SAT formula with m clauses and n variables, use k = m + n sources and destinations. Introduce one sourcedestination pair s x , t x for each variable x, and one sourcedestination pair s c , t c for each clause c. c For each 3 SAT clause, introduce 6 new intermediate vertices, one for each literal occurring in that clause and one for its complement. d Notice that if the path from s c to t c goes through some intermediate vertex representing, say, an occurrence of variable x, then no other path can go through that vertex. What vertex would you like the other path to be forced to go through instead? Chapter 9 Coping with NP-completeness You are the junior member of a seasoned project team. Your current task is to write code for solving a simple-looking problem involving graphs and numbers. What are you supposed to do? If you are very lucky, your problem will be among the half-dozen problems concerning graphs with weights shortest path, minimum spanning tree, maximum flow, etc., that we have solved in this book. Even if this is the case, recognizing such a problem in its natural habitat—grungy and obscured by reality and context—requires practice and skill. It is more likely that you will need to reduce your problem to one of these lucky ones—or to solve it using dynamic programming or linear programming. But chances are that nothing like this will happen. The world of search problems is a bleak landscape. There are a few spots of light—brilliant algorithmic ideas—each illuminating a small area around it the problems that reduce to it; two of these areas, linear and dynamic programming, are in fact decently large. But the remaining vast expanse is pitch dark: NP- complete. What are you to do? You can start by proving that your problem is actually NP-complete. Often a proof by generalization recall the discussion on page 270 and Exercise 8.10 is all that you need; and sometimes a simple reduction from 3 SAT or ZOE is not too difficult to find. This sounds like a theoretical exercise, but, if carried out successfully, it does bring some tangible rewards: now your status in the team has been elevated, you are no longer the kid who can’t do, and you have become the noble knight with the impossible quest. But, unfortunately, a problem does not go away when proved NP-complete. The real ques- tion is, What do you do next? This is the subject of the present chapter and also the inspiration for some of the most important modern research on algorithms and complexity. NP-completeness is not a death certificate—it is only the beginning of a fascinating adventure. Your problem’s NP-completeness proof probably constructs graphs that are complicated and weird, very much unlike those that come up in your application. For example, even though SAT is NP-complete, satisfying assignments for HORN SAT the instances of SAT that come up in logic programming can be found efficiently recall Section 5.3. Or, suppose the graphs that arise in your application are trees. In this case, many NP-complete problems, 283 284 Algorithms such as INDEPENDENT SET , can be solved in linear time by dynamic programming recall Section 6.7. Unfortunately, this approach does not always work. For example, we know that 3 SAT is NP-complete. And the INDEPENDENT SET problem, along with many other NP-complete problems, remains so even for planar graphs graphs that can be drawn in the plane without crossing edges. Moreover, often you cannot neatly characterize the instances that come up in your application. Instead, you will have to rely on some form of intelligent exponential search —procedures such as backtracking and branch and bound which are exponential time in the worst-case, but, with the right design, could be very efficient on typical instances that come up in your application. We discuss these methods in Section 9.1. Or you can develop an algorithm for your NP-complete optimization problem that falls short of the optimum but never by too much. For example, in Section 5.4 we saw that the greedy algorithm always produces a set cover that is no more than log n times the optimal set cover. An algorithm that achieves such a guarantee is called an approximation algorithm. As we will see in Section 9.2, such algorithms are known for many NP-complete optimization problems, and they are some of the most clever and sophisticated algorithms around. And the theory of NP-completeness can again be used as a guide in this endeavor, by showing that, for some problems, there are even severe limits to how well they can be approximated—unless of course P = NP. Finally, there are heuristics, algorithms with no guarantees on either the running time or the degree of approximation. Heuristics rely on ingenuity, intuition, a good understanding of the application, meticulous experimentation, and often insights from physics or biology, to attack a problem. We see some common kinds in Section 9.3.

9.1 Intelligent exhaustive search