S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
257 time OnW , which we noted is exponential in the input size, since it involves W rather than
log W . And we have the usual exhaustive algorithm as well, which looks at all subsets of items—all 2
n
of them. Is there a polynomial algorithm for
KNAPSACK
? Nobody knows of one. But suppose that we are interested in the variant of the knapsack problem in which the
integers are coded in unary—for instance, by writing IIIIIIIIIIII for 12. This is admittedly an exponentially wasteful way to represent integers, but it does define a legitimate problem,
which we could call
UNARY KNAPSACK
. It follows from our discussion that this somewhat artificial problem does have a polynomial algorithm.
A different variation: suppose now that each item’s value is equal to its weight all given in binary, and to top it off, the goal g is the same as the capacity W . To adapt the silly break-in
story whereby we first introduced the knapsack problem, the items are all gold nuggets, and the burglar wants to fill his knapsack to the hilt. This special case is tantamount to finding
a subset of a given set of integers that adds up to exactly W . Since it is a special case of
KNAPSACK
, it cannot be any harder. But could it be polynomial? As it turns out, this problem, called
SUBSET SUM
, is also very hard. At this point one could ask: If
SUBSET SUM
is a special case that happens to be as hard as the general
KNAPSACK
problem, why are we interested in it? The reason is simplicity. In the complicated calculus of reductions between search problems that we shall develop in this
chapter, conceptually simple problems like
SUBSET SUM
and 3
SAT
are invaluable.
8.2 NP-complete problems
Hard problems, easy problems
In short, the world is full of search problems, some of which can be solved efficiently, while others seem to be very hard. This is depicted in the following table.
Hard problems NP-complete Easy problems in P
3
SAT
2
SAT
, H
ORN SAT TRAVELING SALESMAN PROBLEM
MINIMUM SPANNING TREE LONGEST PATH
SHORTEST PATH
3D
MATCHING BIPARTITE MATCHING
KNAPSACK UNARY KNAPSACK
INDEPENDENT SET INDEPENDENT SET
on trees
INTEGER LINEAR PROGRAMMING LINEAR PROGRAMMING
R
UDRATA PATH
E
ULER PATH BALANCED CUT
MINIMUM CUT
This table is worth contemplating. On the right we have problems that can be solved efficiently. On the left, we have a bunch of hard nuts that have escaped efficient solution over
many decades or centuries.
258
Algorithms
The various problems on the right can be solved by algorithms that are specialized and diverse: dynamic programming, network flow, graph search, greedy. These problems are easy
for a variety of different reasons. In stark contrast, the problems on the left are all difficult for the same reason At their
core, they are all the same problem, just in different disguises They are all equivalent: as we shall see in Section 8.3, each of them can be reduced to any of the others—and back.
P and NP
It’s time to introduce some important concepts. We know what a search problem is: its defin- ing characteristic is that any proposed solution can be quickly checked for correctness, in the
sense that there is an efficient checking algorithm C that takes as input the given instance I the data specifying the problem to be solved, as well as the proposed solution S, and outputs
true if and only if S really is a solution to instance I. Moreover the running time of CI, S
is bounded by a polynomial in |I|, the length of the instance. We denote the class of all search problems by NP.
We’ve seen many examples of NP search problems that are solvable in polynomial time.
In such cases, there is an algorithm that takes as input an instance I and has a running time polynomial in |I|. If I has a solution, the algorithm returns such a solution; and if I has no
solution, the algorithm correctly reports so. The class of all search problems that can be solved in polynomial time is denoted P.
Hence, all the search problems on the right-hand side of the
table are in P.
Why P and NP?
Okay, P must stand for “polynomial.” But why use the initials NP the common chatroom abbreviation for “no problem” to describe the class of search problems, some of which are
terribly hard?
NP stands for “nondeterministic polynomial time,” a term going back to the roots of
complexity theory. Intuitively, it means that a solution to any search problem can be found and verified in polynomial time by a special and quite unrealistic sort of algorithm, called a
nondeterministic algorithm . Such an algorithm has the power of guessing correctly at every
step.
Incidentally, the original definition of NP and its most common usage to this day was
not as a class of search problems, but as a class of decision problems: algorithmic questions that can be answered by yes or no. Example: “Is there a truth assignment that satisfies this
Boolean formula?” But this too reflects a historical reality: At the time the theory of NP- completeness was being developed, researchers in the theory of computation were interested
in formal languages, a domain in which such decision problems are of central importance.
Are there search problems that cannot be solved in polynomial time? In other words,
is P 6= NP? Most algorithms researchers think so. It is hard to believe that exponential search can always be avoided, that a simple trick will crack all these hard problems, famously
unsolved for decades and centuries. And there is a good reason for mathematicians to believe
S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
259
that P 6= NP—the task of finding a proof for a given mathematical assertion is a search problem and is therefore in NP after all, when a formal proof of a mathematical statement is
written out in excruciating detail, it can be checked mechanically, line by line, by an efficient algorithm. So if P = NP, there would be an efficient method to prove any theorem, thus
eliminating the need for mathematicians All in all, there are a variety of reasons why it is widely believed that P 6= NP. However, proving this has turned out to be extremely difficult,
one of the deepest and most important unsolved puzzles of mathematics.
Reductions, again
Even if we accept that P 6= NP, what about the specific problems on the left side of the table? On the basis of what evidence do we believe that these particular problems have no
efficient algorithm besides, of course, the historical fact that many clever mathematicians and computer scientists have tried hard and failed to find any? Such evidence is provided
by reductions, which translate one search problem into another. What they demonstrate is that the problems on the left side of the table are all, in some sense, exactly the same problem,
except that they are stated in different languages. What’s more, we will also use reductions to show that these problems are the hardest search problems in NP—if even one of them has a
polynomial time algorithm, then every problem in NP has a polynomial time algorithm. Thus if we believe that P 6= NP, then all these search problems are hard.
We defined reductions in Chapter 7 and saw many examples of them. Let’s now specialize this definition to search problems. A reduction from search problem A to search problem B
is a polynomial-time algorithm f that transforms any instance I of A into an instance f I of B, together with another polynomial-time algorithm h that maps any solution S of f I back
into a solution hS of I; see the following diagram. If f I has no solution, then neither does I. These two translation procedures f and h imply that any algorithm for B can be converted
into an algorithm for A by bracketing it between f and h.
I Instance
Instance f I
f Algorithm for A
for B Algorithm
Solution S of f I
No solution to f I
No solution to I
hS of I Solution
h
And now we can finally define the class of the hardest search problems.
A search problem is NP-complete if all other search problems reduce to it. This is a very strong requirement indeed. For a problem to be NP-complete, it must be useful
in solving every search problem in the world It is remarkable that such problems exist. But they do, and the first column of the table we saw earlier is filled with the most famous
examples. In Section 8.3 we shall see how all these problems reduce to one another, and also why all other search problems reduce to them.
260
Algorithms
Figure 8.6 The space NP of all search problems, assuming P 6= NP.
NP−
Increasing difficulty
P complete
The two ways to use reductions
So far in this book the purpose of a reduction from a problem A to a problem B has been straightforward and honorable: We know how to solve B efficiently, and we want to use this
knowledge to solve A. In this chapter, however, reductions from A to B serve a somewhat perverse goal: we know A is hard, and we use the reduction to prove that B is hard as well
If we denote a reduction from A to B by A −→ B
then we can say that difficulty flows in the direction of the arrow, while efficient algorithms move in the opposite direction. It is through this propagation of difficulty that we know
NP
-complete problems are hard: all other search problems reduce to them, and thus
each NP-complete problem contains the complexity of all search problems. If even one NP
-complete problem is in P, then P = NP.
Reductions also have the convenient property that they compose. If A −→ B and B −→ C, then A −→ C .
To see this, observe first of all that any reduction is completely specified by the pre- and postprocessing functions f and h see the reduction diagram. If f
AB
, h
AB
and f
BC
, h
BC
define the reductions from A to B and from B to C, respectively, then a reduction from A to C is given by compositions of these functions: f
BC
◦f
AB
maps an instance of A to an instance of C and h
AB
◦ h
BC
sends a solution of C back to a solution of A.
This means that once we know a problem A is NP-complete, we can use it to prove that a new search problem B is also NP-complete, simply by reducing A to B. Such a reduction
establishes that all problems in NP reduce to B, via A.
S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
261
Factoring
One last point: we started off this book by introducing another famously hard search problem:
FACTORING
, the task of finding all prime factors of a given integer. But the difficulty of
FACTORING
is of a different nature than that of the other hard search problems we have just seen. For example, nobody believes that
FACTORING
is NP-complete. One major difference
is that, in the case of
FACTORING
, the definition does not contain the now familiar clause “or report that none exists.” A number can always be factored into primes.
Another difference possibly not completely unrelated is this: as we shall see in Chap- ter 10,
FACTORING
succumbs to the power of quantum computation—while
SAT
,
TSP
and the
other NP-complete problems do not seem to.
262
Algorithms
Figure 8.7
Reductions between search problems.
3D
MATCHING
R
UDRATA CYCLE
S
UBSET SUM
TSP ILP
ZOE
All of NP
S
AT
3S
AT
V
ERTEX COVER
I
NDEPENDENT SET
C
LIQUE
8.3 The reductions