A definitive reference

P, NP, and Completeness Theory

Having established the fundamental framework for measuring computational resources, we now turn to one of the most profound questions in theoretical computer science: which problems can be efficiently solved, and which problems appear to be inherently difficult?

The theory of NP-completeness provides a powerful lens for understanding computational hardness. It gives us a formal way to classify problems according to their inherent difficulty, establish relationships between seemingly different problems, and identify the boundary between tractability and intractability.

At the heart of this theory lies the celebrated P versus NP question, widely regarded as the most important open problem in theoretical computer science. This question asks whether problems whose solutions can be efficiently verified can also be efficiently solved. The answer has profound implications for mathematics, cryptography, artificial intelligence, and our fundamental understanding of computation.

This module develops the rich theory of NP-completeness, introduces the framework of polynomial-time reductions, explores the seminal Cook-Levin theorem, and surveys the landscape of important NP-complete problems that arise throughout computer science and mathematics.

Decision Problems Framework

Optimization vs. Decision Formulations

Complexity theory primarily deals with decision problems—problems with yes/no answers. This focus provides a clean mathematical framework while preserving the essential difficulty of computational problems.

Definition: Decision Problem

A decision problem L is a set of strings over some alphabet Σ:

L ⊆ Σ*

An algorithm solves L if for any input x ∈ Σ*:

  • If x ∈ L, the algorithm outputs "yes" or 1
  • If x ∉ L, the algorithm outputs "no" or 0

The strings in L are called positive instances or yes-instances, while strings not in L are negative instances or no-instances.

Definition: Optimization Problem

An optimization problem P consists of:

  • A set I of instances
  • For each instance x ∈ I, a set S(x) of feasible solutions
  • For each instance x and solution y ∈ S(x), a cost function f(x, y)
  • A goal to either minimize or maximize f(x, y)

The objective is to find y* ∈ S(x) such that f(x, y*) is optimal (either minimum or maximum) among all y ∈ S(x).

Definition: Optimization to Decision Conversion

Given an optimization problem P to minimize f(x, y), we can define a corresponding decision problem D:

D = {(x, k) | ∃y ∈ S(x) such that f(x, y) ≤ k}

Similarly, for a maximization problem, the decision version would be:

D = {(x, k) | ∃y ∈ S(x) such that f(x, y) ≥ k}

If we can solve the decision problem efficiently for any value k, we can use binary search to find the optimal value of the original optimization problem.

Certificate Verification Paradigm

A key insight in complexity theory is the distinction between finding a solution and verifying a proposed solution. This verification perspective provides one of the most intuitive characterizations of the class NP.

Definition: Certificate

A certificate (or witness) for a decision problem instance is a string c that helps verify that the instance is a yes-instance.

Formally, for a decision problem L, a certificate scheme consists of:

  • For each x ∈ L, a non-empty set C(x) of valid certificates
  • A verification algorithm V that, given x and c, accepts if and only if c ∈ C(x)

For complexity theory, we're particularly interested in cases where certificates are of polynomial length and can be verified in polynomial time.

Definition: Efficient Verification

A decision problem L has efficient verification if there exists:

  • A polynomial p(n)
  • A polynomial-time algorithm V (the verifier)

Such that for all x:

x ∈ L ⟺ ∃c, |c| ≤ p(|x|) such that V(x, c) = 1

In other words, yes-instances have short, efficiently verifiable certificates.

Theorem: NP Characterization via Certificates

A language L is in NP if and only if it has efficient verification. That is, there exists a polynomial p(n) and a polynomial-time algorithmV such that for all x:

x ∈ L ⟺ ∃c, |c| ≤ p(|x|) such that V(x, c) = 1

Foundational Complexity Classes

P (Deterministic Polynomial Time)

The class P represents problems that can be solved efficiently by deterministic algorithms. It serves as our formal model of tractable computation.

Definition: The Class P

The class P (polynomial time) is the set of all decision problemsL for which there exists a deterministic polynomial-time algorithm Asuch that:

  • If x ∈ L, then A(x) = 1
  • If x ∉ L, then A(x) = 0

Equivalently, P = ⋃k≥1 DTIME(nk).

The class P is robust under different reasonable computational models (multi-tape Turing machines, RAMs, etc.) and is widely accepted as capturing the notion of efficient computation.

Definition: Examples of Problems in P

The following problems are in P:

  • PRIMES = {n | n is a prime number}
    Algorithm: AKS primality test, running in O(log6 n) time
  • PATH = {(G,s,t) | graph G has a path from vertex s to vertex t}
    Algorithm: Breadth-first search, running in O(|V| + |E|) time
  • 2-SAT = {φ | φ is a satisfiable 2-CNF formula}
    Algorithm: Construct and analyze implication graph, running in O(n+m) time
  • LINEAR-PROGRAMMING = {(A,b,c,k) | ∃x: Ax ≤ b and c·x ≥ k}
    Algorithm: Ellipsoid method or interior-point methods, running in polynomial time

NP (Nondeterministic Polynomial Time)

The class NP captures problems whose solutions can be efficiently verified. It includes many important problems for which no efficient solution algorithm is known.

Definition: The Class NP (via Nondeterministic TMs)

The class NP (nondeterministic polynomial time) is the set of all decision problems L for which there exists a nondeterministic polynomial-time Turing machine M such that:

  • If x ∈ L, then M accepts x on some computation path
  • If x ∉ L, then M rejects x on all computation paths

Equivalently, NP = ⋃k≥1 NTIME(nk).

Definition: The Class NP (via Certificates)

A language L is in NP if and only if there exist:

  • A polynomial p(n)
  • A polynomial-time deterministic algorithm V (the verifier)

Such that for all strings x:

x ∈ L ⟺ ∃c, |c| ≤ p(|x|) such that V(x, c) = 1

This characterization defines NP in terms of efficiently verifiable certificates.

Definition: The Class NP (via Relations)

A binary relation R ⊆ Σ* × Σ* is polynomial-time decidable if there is a polynomial-time algorithm that decides, for any x, y, whether (x, y) ∈ R.

R is polynomially bounded if there exists a polynomialp such that for all (x, y) ∈ R, we have |y| ≤ p(|x|).

A language L is in NP if and only if there exists a polynomial-time decidable, polynomially bounded relation R such that:

L = {x | ∃y such that (x, y) ∈ R}

Definition: Examples of Problems in NP

The following problems are in NP:

  • SAT = {φ | φ is a satisfiable Boolean formula}
    Certificate: A satisfying assignment to the variables
  • HAMPATH = {(G,s,t) | graph G has a simple path from s to t visiting every vertex}
    Certificate: The sequence of vertices in the Hamiltonian path
  • CLIQUE = {(G,k) | graph G has a clique of size at least k}
    Certificate: The set of vertices forming a k-clique
  • SUBSET-SUM = {(S,t) | there exists a subset of S that sums to t}
    Certificate: The subset of elements that sum to t
  • 3-COLORING = {G | graph G can be colored with 3 colors with no adjacent vertices sharing a color}
    Certificate: A valid 3-coloring of the vertices

Relationship Between P and NP

Understanding the relationship between P and NP is fundamental to complexity theory. While we know that P is contained in NP, determining whether this containment is strict remains one of the greatest open questions in computer science.

Theorem: P ⊆ NP

Every language in P is also in NP. That is, P ⊆ NP.

Proof: Proof of P ⊆ NP

Let L ∈ P. Then there exists a polynomial-time algorithmA that decides L.

We need to show that L ∈ NP by providing a polynomial-time verifier V and a polynomial bound p(n) on certificate length.

Define verifier V(x, c) as follows:

  1. Ignore the certificate c
  2. Run algorithm A on input x
  3. Output whatever A(x) outputs

Since A runs in polynomial time, so does V. We can set p(n) = 0 since we don't actually need any certificate.

For all x:

  • If x ∈ L, then A(x) = 1, so V(x, c) = 1 for any certificate c
  • If x ∉ L, then A(x) = 0, so V(x, c) = 0 for any certificate c

Therefore, L ∈ NP, and we have shown that P ⊆ NP.

Definition: The P vs. NP Question

The P vs. NP question asks whether P = NP or P ≠ NP.

Formally, the question is whether every language L with a polynomial-time verification algorithm also has a polynomial-time solution algorithm.

This question remains open despite decades of research. Most complexity theorists believe that P ≠ NP, but no proof or disproof has been found.

The P vs. NP Question

Historical Development

The P vs. NP problem emerged in the early 1970s and quickly became recognized as a fundamental question about the nature of computation.

Definition: Historical Timeline

Key developments in the history of the P vs. NP question:

  • 1956: Gödel writes a letter to von Neumann discussing the complexity of finding proofs, anticipating aspects of the P vs. NP question
  • 1971: Stephen Cook's paper "The Complexity of Theorem-Proving Procedures" introduces the concept of NP-completeness and proves that SAT is NP-complete
  • 1971-1972: Leonid Levin independently discovers NP-completeness in the Soviet Union
  • 1972: Richard Karp's paper "Reducibility Among Combinatorial Problems" identifies 21 NP-complete problems, demonstrating the pervasiveness of NP-completeness
  • 1979: The P vs. NP problem begins to gain recognition outside theoretical computer science
  • 1990s: Development of various barriers to proving P ≠ NP (relativization, natural proofs, algebrization)
  • 2000: The Clay Mathematics Institute designates P vs. NP as one of the seven Millennium Prize Problems, offering a $1 million prize for its solution
  • 2000s-present: Continued research, including numerous claimed proofs (both for P=NP and P≠NP), all of which have been found to contain errors

Consequences of P=NP

If P=NP were proven, it would revolutionize our understanding of computation and have far-reaching practical implications.

Definition: Implications of P=NP

If P=NP, the following would be true:

  • Algorithmic Implications: Any problem with efficiently verifiable solutions would have an efficient solution algorithm
  • Self-Optimization: For many optimization problems, we could efficiently find optimal solutions rather than approximations
  • Cryptographic Impact: Most current cryptographic systems would be broken, as they rely on the presumed hardness of certain NP problems
  • Mathematical Consequences: Automated theorem proving would be dramatically more powerful, potentially revolutionizing mathematical practice
  • Learning and AI: Many learning problems would become tractable, potentially leading to dramatic advances in artificial intelligence
  • Complexity Collapse: Many complexity classes would collapse: NP = co-NP = P = BPP, and the polynomial hierarchy would collapse

However, even if P=NP, the polynomial-time algorithm might have a prohibitively high exponent or constant factors, limiting practical impact.

Consequences of P≠NP

If P≠NP were proven, it would establish fundamental limitations on computation and formalize the intuition that some problems require exponential time to solve.

Definition: Implications of P≠NP

If P≠NP, the following would be established:

  • Formal Hardness: There would exist problems whose solutions can be verified efficiently but cannot be found efficiently
  • Cryptographic Security: The theoretical foundation for modern cryptography would be strengthened, as it relies on the assumed hardness of certain problems
  • Inherent Creativity: There would be mathematical confirmation that finding solutions can be inherently more difficult than verifying them
  • Approximation Necessity: For many optimization problems, we would know that we must settle for approximate solutions rather than optimal ones
  • Algorithmic Barriers: Certain algorithmic tasks would be proven to require exponential time in the worst case
  • Proof Techniques: New mathematical techniques developed for proving P≠NP might lead to progress on other complexity separations

Current Approaches and Barriers

Despite decades of research, the P vs. NP question remains open. Several major barriers have been identified that explain why simple proof techniques are insufficient.

Definition: Barriers to Resolving P vs. NP

Three major barriers to resolving the P vs. NP question have been identified:

  • Relativization (Baker, Gill, Solovay, 1975): Techniques that relativize (remain valid when both classes have access to the same oracle) cannot resolve P vs. NP, because there exist oracles A and B such that PA = NPA and PB ≠ NPB
  • Natural Proofs (Razborov, Rudich, 1994): Any "natural" proof technique (with certain constructivity and largeness properties) for showing circuit lower bounds would break widely believed cryptographic assumptions
  • Algebrization (Aaronson, Wigderson, 2008): Even techniques that avoid relativization cannot resolve P vs. NP if they algebrize (remain valid under certain algebraic extensions)

These barriers suggest that resolving P vs. NP will require fundamentally new mathematical techniques.

Definition: Evidence for P≠NP

While no proof exists, the following constitute evidence suggesting P≠NP:

  • Algorithmic Experience: Decades of research have yielded no polynomial-time algorithms for NP-complete problems, despite strong incentives to find them
  • Lower Bounds: Proven exponential lower bounds for restricted models of computation (e.g., constant-depth circuits)
  • Oracle Results: For random oracles A, PA ≠ NPA with probability 1
  • Time-Space Tradeoffs: Proven time-space tradeoffs for SAT algorithms in restricted models
  • Logical Implications: P=NP would have consequences that seem implausible, such as the effective automatability of mathematical creativity

Reduction and Completeness

Polynomial-Time Reductions

Reductions allow us to compare the difficulty of problems and establish relationships between them. Polynomial-time reductions preserve efficiency, making them the appropriate tool for studying NP-completeness.

Definition: Polynomial-Time Reduction

A polynomial-time reduction (or Karp reduction) from a languageA to a language B, denoted A ≤p B, is a polynomial-time computable function f : Σ* → Σ* such that for all x:

x ∈ A ⟺ f(x) ∈ B

Intuitively, A ≤p B means that problem A is "no harder than" problem B, because any instance of A can be efficiently transformed into an equivalent instance of B.

Theorem: Properties of Polynomial-Time Reductions

Polynomial-time reductions have the following properties:

  1. Reflexivity: For any language A,A ≤p A
  2. Transitivity: If A ≤p Band B ≤p C, then A ≤p C
  3. Preservation of Polynomial-Time Solvability: If A ≤p Band B ∈ P, then A ∈ P
  4. Preservation of NP Membership: If A ≤p Band B ∈ NP, then A ∈ NP

Proof: Proof of Transitivity

Suppose A ≤p B via polynomial-time functionf and B ≤p Cvia polynomial-time function g.

Let's define a function h(x) = g(f(x)). We'll show that h is a polynomial-time reduction from Ato C.

First, h is computable in polynomial time becausef and g each run in polynomial time, and the composition of polynomials is still a polynomial.

Second, for all x:

x ∈ A ⟺ f(x) ∈ B ⟺ g(f(x)) ∈ C ⟺ h(x) ∈ C

Therefore, A ≤p C via the function h.

The Concept of Completeness

Completeness identifies the "hardest" problems within a complexity class. A complete problem captures the full difficulty of the class.

Definition: NP-Completeness

A language L is NP-complete if:

  1. L ∈ NP (L is in NP)
  2. For all A ∈ NP, A ≤p L (L is NP-hard)

A problem is NP-hard if it satisfies condition 2, even if it is not in NP.

Theorem: Properties of NP-Completeness

The following properties hold for NP-completeness:

  1. If L is NP-complete and L ∈ P, then P = NP
  2. If L is NP-complete and L ≤p Mfor some M ∈ NP, then M is also NP-complete
  3. All NP-complete problems are polynomial-time reducible to each other

Cook-Levin Theorem

The Cook-Levin theorem establishes that NP-complete problems exist by proving that Boolean satisfiability (SAT) is NP-complete. This seminal result laid the foundation for the theory of NP-completeness.

Definition: Boolean Satisfiability (SAT)

The Boolean satisfiability problem (SAT) is defined as:

SAT = {φ | φ is a satisfiable Boolean formula}

A Boolean formula φ is in conjunctive normal form (CNF) if:

  • It is a conjunction (AND) of clauses: C₁ ∧ C₂ ∧ ... ∧ Cm
  • Each clause Ci is a disjunction (OR) of literals: l₁ ∨ l₂ ∨ ... ∨ lk
  • Each literal lj is either a variable xj or its negation ¬xj

The formula is satisfiable if there exists an assignment of true/false values to the variables that makes the entire formula evaluate to true.

Theorem: Cook-Levin Theorem

The Boolean satisfiability problem (SAT) is NP-complete.

Proof: Proof of Cook-Levin Theorem

To prove that SAT is NP-complete, we need to show:

  1. SAT ∈ NP
  2. For all L ∈ NP, L ≤p SAT

SAT ∈ NP: Given a formula φ and an assignment to its variables, we can evaluate φ in polynomial time to verify if the assignment satisfies φ. Thus, SAT ∈ NP.

L ≤p SAT for all L ∈ NP: Let L ∈ NP. By definition, there exists a nondeterministic polynomial-time Turing machine M that decides L.

For an input x, we'll construct a Boolean formula φx that is satisfiable if and only if M accepts x. This formula will encode the computation tableau of M on input x:

  • Let p(|x|) be a polynomial bound on the running time of M. The tableau is a p(|x|) × p(|x|) grid representing the computation of M on x.
  • For each cell (i, j) in the tableau, create variables representing:
    • The state of M at time i
    • The symbol in tape position j at time i
    • The position of the tape head at time i
  • Create clauses that ensure:
    • The initial configuration correctly represents M starting on input x
    • Each configuration follows from the previous one according to M's transition function
    • The final configuration is an accepting state

The formula φx is satisfiable if and only if there exists a sequence of configurations leading from the initial configuration to an accepting configuration, which happens if and only if M accepts x.

The size of φx is polynomial in |x|, and the construction takes polynomial time. Thus, we have a polynomial-time reduction from L to SAT.

Since this works for any L ∈ NP, SAT is NP-complete.

Techniques for Proving NP-Completeness

After the Cook-Levin theorem established the first NP-complete problem, subsequent proofs typically use reduction from known NP-complete problems.

Definition: Standard Approach to NP-Completeness Proofs

To prove that a problem L is NP-complete:

  1. Show L ∈ NP: Describe a polynomial-time verification algorithm for L
  2. Show L is NP-hard: Choose a known NP-complete problem L' and construct a polynomial-time reduction from L' to L

By transitivity of reductions, if L' is NP-complete and L' ≤p L, then L is NP-hard. Combined with L ∈ NP, this proves that L is NP-complete.

Definition: Common Reduction Techniques

Several techniques are commonly used in NP-completeness reductions:

  • Variable Gadgets: Create components that represent variables and their truth values
  • Clause Gadgets: Create components that represent clauses and ensure they are satisfied
  • Consistency Gadgets: Ensure that variable representations remain consistent throughout the construction
  • Local Replacement: Replace each element of the source problem with a gadget in the target problem
  • Forced Structure: Force the solution to have a particular structure, within which the original problem is embedded
  • Restriction: Show that a special case of your problem is already NP-complete

Definition: Choosing the Right Source Problem

When proving a problem L is NP-complete, the choice of source problem L' for the reduction (L' ≤p L) can significantly affect the complexity of the proof. Consider:

  • Structural Similarity: Choose L' with a structure similar to L
  • Constraint Type: Match boolean constraints with boolean problems, numeric constraints with numeric problems, etc.
  • Common Source Problems:
    • 3-SAT: Good for problems involving choice or constraint satisfaction
    • CLIQUE or INDEPENDENT SET: Good for selection problems
    • 3-COLORING: Good for assignment or labeling problems
    • SUBSET-SUM or PARTITION: Good for numeric or partitioning problems
    • HAMILTONIAN-PATH: Good for sequencing or path problems

NP-Complete Problem Catalog

Boolean Satisfiability Problems

Boolean satisfiability problems form the foundation of NP-completeness theory, starting with Cook and Levin's seminal result.

Definition: 3-SAT

The 3-SAT problem is a restricted version of SAT where:

3-SAT = {φ | φ is a satisfiable Boolean formula in CNF with exactly 3 literals per clause}

Despite the restriction to exactly 3 literals per clause, 3-SAT remains NP-complete. This makes it a particularly useful problem for reductions.

Theorem: 3-SAT is NP-Complete

The 3-SAT problem is NP-complete.

Proof: Proof of 3-SAT NP-Completeness

3-SAT ∈ NP: Given a 3-CNF formula φ and an assignment, we can verify in polynomial time whether the assignment satisfies φ.

SAT ≤p 3-SAT: We transform an arbitrary SAT formula into a 3-SAT formula that is satisfiable if and only if the original formula is satisfiable.

For each clause C = (l₁ ∨ l₂ ∨ ... ∨ lk) in the original formula:

  • If k = 1, replace C = (l₁) with (l₁ ∨ y ∨ z) ∧ (l₁ ∨ y ∨ ¬z) ∧ (l₁ ∨ ¬y ∨ z) ∧ (l₁ ∨ ¬y ∨ ¬z), where y and z are new variables
  • If k = 2, replace C = (l₁ ∨ l₂) with (l₁ ∨ l₂ ∨ y) ∧ (l₁ ∨ l₂ ∨ ¬y), where y is a new variable
  • If k = 3, keep C as is
  • If k > 3, introduce new variables y₁, y₂, ..., yk-3 and replace C = (l₁ ∨ l₂ ∨ ... ∨ lk) with:
    (l₁ ∨ l₂ ∨ y₁) ∧ (¬y₁ ∨ l₃ ∨ y₂) ∧ (¬y₂ ∨ l₄ ∨ y₃) ∧ ... ∧ (¬yk-4 ∨ lk-2 ∨ yk-3) ∧ (¬yk-3 ∨ lk-1 ∨ lk)

This transformation produces a 3-CNF formula φ' that is satisfiable if and only if the original formula φ is satisfiable. The transformation runs in polynomial time, and the size of φ' is linear in the size of φ.

Since SAT is NP-complete and SAT ≤p 3-SAT, 3-SAT is NP-hard. Combined with 3-SAT ∈ NP, we conclude that 3-SAT is NP-complete.

Definition: Other SAT Variants

Several important variants of SAT have been studied:

  • k-SAT: Each clause has exactly k literals
    • NP-complete for k ≥ 3
    • In P for k = 2 (2-SAT is solvable in linear time)
    • Trivial for k = 1 (1-SAT is solvable in linear time)
  • MAX-SAT: Find an assignment that satisfies the maximum number of clauses
    • NP-hard optimization problem
    • Even MAX-2-SAT is NP-hard
  • NAE-SAT (Not-All-Equal SAT): Each clause must have at least one true and one false literal
    • NP-complete even for 3-NAE-SAT
  • XOR-SAT: Each clause is an XOR of literals (exactly one must be true)
    • In P (solvable using Gaussian elimination)
  • HORN-SAT: Each clause has at most one positive literal
    • In P (solvable in linear time)

Graph Problems

Many important NP-complete problems involve graphs, capturing a wide range of computational challenges in network design, scheduling, and optimization.

Definition: CLIQUE

A clique in an undirected graph G = (V, E) is a subset of vertices C ⊆ V such that every two vertices in C are connected by an edge in E. The CLIQUE problem is:

CLIQUE = {(G, k) | graph G has a clique of size at least k}

Theorem: CLIQUE is NP-Complete

The CLIQUE problem is NP-complete.

Proof: Proof Sketch of CLIQUE NP-Completeness

CLIQUE ∈ NP: Given a graph G and a subset C of vertices, we can verify in polynomial time that C is a clique of size at least k by checking that |C| ≥ k and that every pair of vertices in C is connected by an edge.

3-SAT ≤p CLIQUE: Given a 3-CNF formula φ with m clauses, we construct a graph G:

  • For each clause Cj and each literal li,j in Cj, create a vertex vi,j
  • Connect two vertices vi,j and vi',j' by an edge if:
    • j ≠ j' (they are from different clauses), and
    • li,j and li',j' are not negations of each other

Set k = m, the number of clauses in φ.

Claim: φ is satisfiable if and only if G has a clique of size at least m.

(⇒) If φ is satisfiable, let α be a satisfying assignment. For each clause Cj, choose one literal li,j that is true under α. The corresponding vertices vi,j form a clique of size m because:

  • We select exactly one vertex from each clause
  • No two selected literals can be negations of each other (both can't be true under α)

(⇐) If G has a clique C of size m, it must include exactly one vertex from each clause (since vertices from the same clause are not connected). Set variables to make the corresponding literals true. This is consistent because no two literals in the clique are negations of each other. This assignment satisfies φ because at least one literal in each clause is true.

The reduction is polynomial-time, and since 3-SAT is NP-complete, CLIQUE is NP-complete.

Definition: INDEPENDENT-SET and VERTEX-COVER

An independent set in a graph G = (V, E) is a subset S ⊆ V such that no two vertices in S are adjacent. The INDEPENDENT-SET problem is:

INDEPENDENT-SET = {(G, k) | graph G has an independent set of size at least k}

A vertex cover in a graph G = (V, E) is a subset C ⊆ V such that every edge in E has at least one endpoint in C. The VERTEX-COVER problem is:

VERTEX-COVER = {(G, k) | graph G has a vertex cover of size at most k}

These problems are closely related: S is an independent set in G if and only if V-S is a vertex cover in G.

Definition: HAMILTONIAN-PATH and HAMILTONIAN-CYCLE

A Hamiltonian path in a graph G = (V, E) is a simple path that visits every vertex exactly once. The HAMILTONIAN-PATH problem is:

HAMILTONIAN-PATH = {G | graph G has a Hamiltonian path}

A Hamiltonian cycle is a simple cycle that visits every vertex exactly once (except the start/end vertex). The HAMILTONIAN-CYCLE problem is:

HAMILTONIAN-CYCLE = {G | graph G has a Hamiltonian cycle}

Both problems are NP-complete, even for special classes of graphs like planar graphs.

Definition: GRAPH-COLORING

A k-coloring of a graph G = (V, E) is an assignment of k colors to vertices such that no adjacent vertices have the same color. The GRAPH-COLORING problem is:

GRAPH-COLORING = {(G, k) | graph G can be colored with at most k colors}

For any fixed k ≥ 3, the k-COLORING problem is NP-complete:

k-COLORING = {G | graph G can be colored with at most k colors}

2-COLORING is in P (a graph is 2-colorable if and only if it is bipartite), but 3-COLORING is already NP-complete.

Partition and Covering Problems

Many NP-complete problems involve partitioning or covering elements in optimal ways, with applications in resource allocation, scheduling, and logistics.

Definition: SUBSET-SUM

The SUBSET-SUM problem is:

SUBSET-SUM = {(S, t) | there exists a subset S' ⊆ S such that Σ(S') = t}

where S is a set of integers and Σ(S') denotes the sum of elements in S'.

SUBSET-SUM is a well-known NP-complete problem, but it is pseudopolynomial: it can be solved in time O(n·t) where n is the number of elements and t is the target sum. This is polynomial in the numeric value of t, but exponential in the input length log t.

Definition: PARTITION

The PARTITION problem is:

PARTITION = {S | there exists a partition of S into S₁ and S₂ such that Σ(S₁) = Σ(S₂)}

where S is a set of positive integers, and the partition (S₁, S₂) satisfies S₁ ∪ S₂ = S and S₁ ∩ S₂ = ∅.

PARTITION is a special case of SUBSET-SUM where the target t = Σ(S)/2. It is NP-complete but also has a pseudopolynomial-time algorithm.

Definition: BIN-PACKING

The BIN-PACKING decision problem is:

BIN-PACKING = {(S, c, k) | the items in S can be packed into k or fewer bins of capacity c}

where S is a set of items with sizes s₁, s₂, ..., sn, c is the bin capacity, and k is the target number of bins.

The optimization version asks for the minimum number of bins needed. BIN-PACKING is NP-complete, and the optimization version is NP-hard.

Definition: SET-COVER

The SET-COVER problem is:

SET-COVER = {(U, S, k) | there exists a subset S' ⊆ S such that |S'| ≤ k and ⋃(S') = U}

where U is a universe of elements, S is a collection of subsets of U, and k is the target size of the cover.

The optimization version asks for the smallest subset S' that covers all elements in U. SET-COVER is NP-complete, and the optimization version is NP-hard.

Scheduling and Optimization Problems

Many practical optimization problems from operations research and scheduling are NP-complete, highlighting the prevalence of computational hardness in real-world applications.

Definition: JOB-SCHEDULING

In the JOB-SCHEDULING problem, we have:

  • A set of n jobs, each with processing time ti
  • m identical machines
  • A deadline D

The decision problem asks if there exists a schedule assigning each job to one machine such that all jobs complete by time D.

JOB-SCHEDULING = {(J, m, D) | jobs J can be scheduled on m machines to complete by deadline D}

This problem is NP-complete for m ≥ 2, but polynomial-time solvable for m = 1.

Definition: TRAVELING-SALESMAN

The TRAVELING-SALESMAN decision problem (TSP) is:

TSP = {(G, c, k) | graph G has a Hamiltonian cycle of total cost at most k}

where G is a complete graph, c assigns a cost to each edge, and k is the budget.

The optimization version asks for the minimum-cost Hamiltonian cycle. TSP is NP-complete, and the optimization version is NP-hard. It remains NP-complete even when the costs satisfy the triangle inequality.

Definition: KNAPSACK

The KNAPSACK decision problem is:

KNAPSACK = {(S, v, w, W, V) | there exists S' ⊆ S such that Σ(w(S')) ≤ W and Σ(v(S')) ≥ V}

where S is a set of items, v(i) is the value of item i, w(i) is the weight of item i, W is the weight capacity, and V is the target value.

The optimization version asks for the maximum achievable value within the weight constraint. KNAPSACK is NP-complete, but like SUBSET-SUM, it has a pseudopolynomial-time algorithm running in O(n·W) time.

Beyond NP

co-NP and Complementary Problems

Beyond NP, complexity theory examines problems whose complements are in NP, forming the class co-NP.

Definition: The Class co-NP

The class co-NP is defined as:

co-NP = {L | L̄ ∈ NP}

where L̄ = Σ* - L is the complement of language L.

Equivalently, L ∈ co-NP if there exists a polynomial-time verifier V such that:
x ∈ L if and only if for all certificates c of polynomial length, V(x, c) = 0.

Definition: NP ∩ co-NP

The class NP ∩ co-NP contains languages that are in both NP and co-NP.

For a language L ∈ NP ∩ co-NP:

  • There exists a polynomial-time verifier Vyes that verifies "yes" instances
  • There exists a polynomial-time verifier Vno that verifies "no" instances

It is believed that NP ∩ co-NP contains problems that are unlikely to be NP-complete. Example problems in NP ∩ co-NP include integer factorization and parity games.

Definition: co-NP-Complete Problems

A language L is co-NP-complete if:

  1. L ∈ co-NP
  2. For all L' ∈ co-NP, L' ≤p L

Examples of co-NP-complete problems:

  • TAUTOLOGY = {φ | φ is a Boolean formula that evaluates to true for all assignments}
  • UNSAT = {φ | φ is an unsatisfiable Boolean formula}
  • VALID = {φ | φ is a valid first-order formula (true in all models)}

If any co-NP-complete problem is in NP, then NP = co-NP.

The Polynomial Hierarchy

The polynomial hierarchy extends beyond NP and co-NP to capture even more complex problems involving alternating quantifiers.

Definition: The Polynomial Hierarchy

The polynomial hierarchy is a sequence of complexity classes defined inductively:

  • Σ0P = Π0P = P
  • Σi+1P = NPΣiP (problems solvable by an NP machine with oracle access to ΣiP)
  • Πi+1P = co-NPΣiP (complements of problems in Σi+1P)

In particular:

  • Σ1P = NP
  • Π1P = co-NP
  • Σ2P = NPNP
  • Π2P = co-NPNP

The entire hierarchy is denoted PH = ⋃i≥0 ΣiP.

Definition: Complete Problems for Higher Levels

Examples of complete problems for higher levels of the polynomial hierarchy:

  • Σ2P-complete: QSAT2 = {φ | ∃x₁...∃xₙ∀y₁...∀yₘ φ(x₁,...,xₙ,y₁,...,yₘ) = 1}
    (Quantified Boolean formulas with one quantifier alternation, starting with ∃)
  • Π2P-complete: QSAT2' = {φ | ∀x₁...∀xₙ∃y₁...∃yₘ φ(x₁,...,xₙ,y₁,...,yₘ) = 1}
    (Quantified Boolean formulas with one quantifier alternation, starting with ∀)
  • PSPACE-complete: QSAT = {φ | φ is a true quantified Boolean formula with any number of alternations}

Theorem: Collapse of the Polynomial Hierarchy

The following statements are equivalent:

  1. P = NP
  2. The polynomial hierarchy collapses to the first level: PH = Σ1P = NP

More generally, if ΣiP = ΠiP for some i, then the hierarchy collapses to the i-th level: PH = ΣiP.

It is widely believed that the polynomial hierarchy does not collapse, which would imply P ≠ NP.

FP and NP Optimization Problems

Many real-world problems involve optimization rather than decision. Complexity theory extends to these optimization problems as well.

Definition: Function Problems and FP

A function problem is specified by a relation R(x, y) and asks: given input x, find a y such that R(x, y) holds (if such a y exists).

The class FP (Function Polynomial-time) consists of all function problems solvable in polynomial time:

FP = {f : Σ* → Σ* | f is computable in polynomial time}

For decision problems in P, the corresponding search problems are in FP.

Definition: NP Optimization Problems

An NP optimization problem (NPO) consists of:

  • A set of problem instances I, recognizable in polynomial time
  • For each instance x ∈ I, a set of feasible solutions S(x)
  • A polynomial-time computable cost function f(x, y) for x ∈ I and y ∈ S(x)
  • A goal to either maximize or minimize f(x, y)

An NPO problem is in PO (Polynomial-time Optimizable) if it can be solved optimally in polynomial time.

The relationship P = NP would imply PO = NPO, meaning all optimization problems with efficiently verifiable solutions could be solved optimally in polynomial time.

Definition: Approximation Classes

For NP-hard optimization problems, we often study approximation algorithms with guaranteed performance ratios:

  • APX: Problems that can be approximated within some constant factor c
  • PTAS (Polynomial-Time Approximation Scheme): Problems that can be approximated within a factor (1+ε) for any ε > 0, with running time polynomial in the input size (but possibly exponential in 1/ε)
  • FPTAS (Fully Polynomial-Time Approximation Scheme): Like PTAS, but with running time polynomial in both the input size and 1/ε

There exist approximation-preserving reductions and APX-complete problems, analogous to NP-completeness for decision problems.

The study of approximation algorithms and the limits of approximability represents a major area of contemporary complexity theory, offering practical approaches to dealing with NP-hard optimization problems in the real world.

Summary

  • Decision Problems Framework: Complexity theory primarily studies decision problems (yes/no questions), which provide a clean mathematical framework while preserving the essential difficulty of computational challenges. Optimization problems can be converted to decision versions through threshold parameters.
  • P and NP: The class P captures problems solvable in polynomial time, while NP captures problems whose solutions can be verified in polynomial time. Whether P equals NP remains the central open question in complexity theory, with profound implications for computation, mathematics, and cryptography.
  • NP-Completeness: The Cook-Levin theorem established that Boolean satisfiability (SAT) is NP-complete, meaning it is both in NP and can represent any other problem in NP through polynomial-time reductions. This led to the identification of thousands of NP-complete problems across diverse domains.
  • Reduction Techniques: Polynomial-time reductions allow us to compare problem difficulty and establish hardness results. Key techniques include variable gadgets, clause gadgets, and forced structure constructions.
  • Beyond NP: The theory extends to co-NP (complements of NP problems), the polynomial hierarchy (capturing quantifier alternation), and optimization variants of NP problems. These extensions provide a richer understanding of computational complexity beyond the P vs. NP question.

Suggested Reading

  • Garey, M. R., Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness – The classic reference on NP-completeness with a comprehensive catalog of NP-complete problems
  • Arora, S., Barak, B. Computational Complexity: A Modern Approach – Graduate-level treatment of complexity theory including P, NP, and beyond
  • Sipser, M. Introduction to the Theory of Computation – Accessible introduction to complexity theory in the context of computability
  • Cook, S. A. The Complexity of Theorem-Proving Procedures – The original paper introducing NP-completeness and the Cook-Levin theorem
  • Karp, R. M. Reducibility Among Combinatorial Problems – Landmark paper identifying 21 NP-complete problems
  • Papadimitriou, C. H. Computational Complexity – Comprehensive treatment of complexity classes including the polynomial hierarchy
  • Vazirani, V. V. Approximation Algorithms – In-depth coverage of approximation techniques for NP-hard optimization problems
  • Fortnow, L. The Golden Ticket: P, NP, and the Search for the Impossible – Accessible general-audience book on the P vs. NP problem and its implications