NP-complete problems Big-O notation

S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 257 time OnW , which we noted is exponential in the input size, since it involves W rather than log W . And we have the usual exhaustive algorithm as well, which looks at all subsets of items—all 2 n of them. Is there a polynomial algorithm for KNAPSACK ? Nobody knows of one. But suppose that we are interested in the variant of the knapsack problem in which the integers are coded in unary—for instance, by writing IIIIIIIIIIII for 12. This is admittedly an exponentially wasteful way to represent integers, but it does define a legitimate problem, which we could call UNARY KNAPSACK . It follows from our discussion that this somewhat artificial problem does have a polynomial algorithm. A different variation: suppose now that each item’s value is equal to its weight all given in binary, and to top it off, the goal g is the same as the capacity W . To adapt the silly break-in story whereby we first introduced the knapsack problem, the items are all gold nuggets, and the burglar wants to fill his knapsack to the hilt. This special case is tantamount to finding a subset of a given set of integers that adds up to exactly W . Since it is a special case of KNAPSACK , it cannot be any harder. But could it be polynomial? As it turns out, this problem, called SUBSET SUM , is also very hard. At this point one could ask: If SUBSET SUM is a special case that happens to be as hard as the general KNAPSACK problem, why are we interested in it? The reason is simplicity. In the complicated calculus of reductions between search problems that we shall develop in this chapter, conceptually simple problems like SUBSET SUM and 3 SAT are invaluable.

8.2 NP-complete problems

Hard problems, easy problems In short, the world is full of search problems, some of which can be solved efficiently, while others seem to be very hard. This is depicted in the following table. Hard problems NP-complete Easy problems in P 3 SAT 2 SAT , H ORN SAT TRAVELING SALESMAN PROBLEM MINIMUM SPANNING TREE LONGEST PATH SHORTEST PATH 3D MATCHING BIPARTITE MATCHING KNAPSACK UNARY KNAPSACK INDEPENDENT SET INDEPENDENT SET on trees INTEGER LINEAR PROGRAMMING LINEAR PROGRAMMING R UDRATA PATH E ULER PATH BALANCED CUT MINIMUM CUT This table is worth contemplating. On the right we have problems that can be solved efficiently. On the left, we have a bunch of hard nuts that have escaped efficient solution over many decades or centuries. 258 Algorithms The various problems on the right can be solved by algorithms that are specialized and diverse: dynamic programming, network flow, graph search, greedy. These problems are easy for a variety of different reasons. In stark contrast, the problems on the left are all difficult for the same reason At their core, they are all the same problem, just in different disguises They are all equivalent: as we shall see in Section 8.3, each of them can be reduced to any of the others—and back. P and NP It’s time to introduce some important concepts. We know what a search problem is: its defin- ing characteristic is that any proposed solution can be quickly checked for correctness, in the sense that there is an efficient checking algorithm C that takes as input the given instance I the data specifying the problem to be solved, as well as the proposed solution S, and outputs true if and only if S really is a solution to instance I. Moreover the running time of CI, S is bounded by a polynomial in |I|, the length of the instance. We denote the class of all search problems by NP. We’ve seen many examples of NP search problems that are solvable in polynomial time. In such cases, there is an algorithm that takes as input an instance I and has a running time polynomial in |I|. If I has a solution, the algorithm returns such a solution; and if I has no solution, the algorithm correctly reports so. The class of all search problems that can be solved in polynomial time is denoted P. Hence, all the search problems on the right-hand side of the table are in P. Why P and NP? Okay, P must stand for “polynomial.” But why use the initials NP the common chatroom abbreviation for “no problem” to describe the class of search problems, some of which are terribly hard? NP stands for “nondeterministic polynomial time,” a term going back to the roots of complexity theory. Intuitively, it means that a solution to any search problem can be found and verified in polynomial time by a special and quite unrealistic sort of algorithm, called a nondeterministic algorithm . Such an algorithm has the power of guessing correctly at every step. Incidentally, the original definition of NP and its most common usage to this day was not as a class of search problems, but as a class of decision problems: algorithmic questions that can be answered by yes or no. Example: “Is there a truth assignment that satisfies this Boolean formula?” But this too reflects a historical reality: At the time the theory of NP- completeness was being developed, researchers in the theory of computation were interested in formal languages, a domain in which such decision problems are of central importance. Are there search problems that cannot be solved in polynomial time? In other words, is P 6= NP? Most algorithms researchers think so. It is hard to believe that exponential search can always be avoided, that a simple trick will crack all these hard problems, famously unsolved for decades and centuries. And there is a good reason for mathematicians to believe S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 259 that P 6= NP—the task of finding a proof for a given mathematical assertion is a search problem and is therefore in NP after all, when a formal proof of a mathematical statement is written out in excruciating detail, it can be checked mechanically, line by line, by an efficient algorithm. So if P = NP, there would be an efficient method to prove any theorem, thus eliminating the need for mathematicians All in all, there are a variety of reasons why it is widely believed that P 6= NP. However, proving this has turned out to be extremely difficult, one of the deepest and most important unsolved puzzles of mathematics. Reductions, again Even if we accept that P 6= NP, what about the specific problems on the left side of the table? On the basis of what evidence do we believe that these particular problems have no efficient algorithm besides, of course, the historical fact that many clever mathematicians and computer scientists have tried hard and failed to find any? Such evidence is provided by reductions, which translate one search problem into another. What they demonstrate is that the problems on the left side of the table are all, in some sense, exactly the same problem, except that they are stated in different languages. What’s more, we will also use reductions to show that these problems are the hardest search problems in NP—if even one of them has a polynomial time algorithm, then every problem in NP has a polynomial time algorithm. Thus if we believe that P 6= NP, then all these search problems are hard. We defined reductions in Chapter 7 and saw many examples of them. Let’s now specialize this definition to search problems. A reduction from search problem A to search problem B is a polynomial-time algorithm f that transforms any instance I of A into an instance f I of B, together with another polynomial-time algorithm h that maps any solution S of f I back into a solution hS of I; see the following diagram. If f I has no solution, then neither does I. These two translation procedures f and h imply that any algorithm for B can be converted into an algorithm for A by bracketing it between f and h. I Instance Instance f I f Algorithm for A for B Algorithm Solution S of f I No solution to f I No solution to I hS of I Solution h And now we can finally define the class of the hardest search problems. A search problem is NP-complete if all other search problems reduce to it. This is a very strong requirement indeed. For a problem to be NP-complete, it must be useful in solving every search problem in the world It is remarkable that such problems exist. But they do, and the first column of the table we saw earlier is filled with the most famous examples. In Section 8.3 we shall see how all these problems reduce to one another, and also why all other search problems reduce to them. 260 Algorithms Figure 8.6 The space NP of all search problems, assuming P 6= NP. NP− Increasing difficulty P complete The two ways to use reductions So far in this book the purpose of a reduction from a problem A to a problem B has been straightforward and honorable: We know how to solve B efficiently, and we want to use this knowledge to solve A. In this chapter, however, reductions from A to B serve a somewhat perverse goal: we know A is hard, and we use the reduction to prove that B is hard as well If we denote a reduction from A to B by A −→ B then we can say that difficulty flows in the direction of the arrow, while efficient algorithms move in the opposite direction. It is through this propagation of difficulty that we know NP -complete problems are hard: all other search problems reduce to them, and thus each NP-complete problem contains the complexity of all search problems. If even one NP -complete problem is in P, then P = NP. Reductions also have the convenient property that they compose. If A −→ B and B −→ C, then A −→ C . To see this, observe first of all that any reduction is completely specified by the pre- and postprocessing functions f and h see the reduction diagram. If f AB , h AB and f BC , h BC define the reductions from A to B and from B to C, respectively, then a reduction from A to C is given by compositions of these functions: f BC ◦f AB maps an instance of A to an instance of C and h AB ◦ h BC sends a solution of C back to a solution of A. This means that once we know a problem A is NP-complete, we can use it to prove that a new search problem B is also NP-complete, simply by reducing A to B. Such a reduction establishes that all problems in NP reduce to B, via A. S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 261 Factoring One last point: we started off this book by introducing another famously hard search problem: FACTORING , the task of finding all prime factors of a given integer. But the difficulty of FACTORING is of a different nature than that of the other hard search problems we have just seen. For example, nobody believes that FACTORING is NP-complete. One major difference is that, in the case of FACTORING , the definition does not contain the now familiar clause “or report that none exists.” A number can always be factored into primes. Another difference possibly not completely unrelated is this: as we shall see in Chap- ter 10, FACTORING succumbs to the power of quantum computation—while SAT , TSP and the other NP-complete problems do not seem to. 262 Algorithms Figure 8.7 Reductions between search problems. 3D MATCHING R UDRATA CYCLE S UBSET SUM TSP ILP ZOE All of NP S AT 3S AT V ERTEX COVER I NDEPENDENT SET C LIQUE

8.3 The reductions