Pushdown automata (PDAs) are a class of computational models that extend finite automata by adding a stack memory. This addition allows PDAs to recognize context-free languages, a strictly larger class of languages than regular languages.
PDAs bridge the gap between finite automata and Turing machines in the Chomsky hierarchy. They are powerful enough to handle nested structures like balanced parentheses and recursive patterns, but still have limitations compared to more general models of computation.
In this section, we'll explore the formal definition of PDAs, their relationship to context-free grammars, conversion techniques, and applications in parsing and compiler design. We'll also provide interactive tools to experiment with PDAs and gain intuition about their behavior.
Introduction to Pushdown Automata
Finite automata, while useful for recognizing regular languages, have a fundamental limitation: they lack memory. This limitation means they cannot count or match patterns that require remembering an arbitrary amount of information. For example, a finite automaton cannot recognize the language of balanced parentheses, as it would need to remember how many opening parentheses it has seen to match with closing ones.
Pushdown automata (PDAs) extend finite automata by adding a stack memory. A stack is a data structure that follows the Last-In-First-Out (LIFO) principle: the last item pushed onto the stack is the first one to be popped off. This addition of memory allows PDAs to recognize a broader class of languages: context-free languages.
A common real-world analogy for a stack is a stack of plates. You can only add or remove plates from the top of the stack. Similarly, the stack in a PDA can only be manipulated from the top: you can push new symbols onto it, pop symbols off it, or read the topmost symbol without removing it.
A classic example of a language that can be recognized by a PDA but not by a finite automaton is the language of matching parentheses, such as {(())(())}. To check if parentheses are balanced, you push an opening parenthesis onto the stack when you see it, and pop a parenthesis off the stack when you see a closing one. If the stack is empty at the end, the parentheses are balanced.
Formal Definition
Definition: Pushdown Automaton
A pushdown automaton (PDA) is a 7-tuple (Q, Σ, Γ, δ, q₀, Z₀, F) where:
Q is a finite set of states
Σ is a finite input alphabet
Γ is a finite stack alphabet
δ: Q × (Σ ∪ ε) × Γ → P(Q × Γ*) is the transition function, where P is the power set and Γ* is the set of all strings of stack symbols
q₀ ∈ Q is the start state
Z₀ ∈ Γ is the initial stack symbol
F ⊆ Q is the set of accepting states
The transition function δ takes the current state, an input symbol (or ε), and the current top of the stack, and maps to a set of possible next states and strings to push onto the stack (after popping the top symbol).
Stack Operations
The stack in a PDA supports three basic operations:
Push: Add a symbol to the top of the stack
Pop: Remove the topmost symbol from the stack
Read: Examine the topmost symbol without removing it
These operations are integral to how the PDA processes input and makes transitions. In formal notation, transitions are often written as (q, a, X) → (p, α), meaning "from state q, reading input a, and with X on top of the stack, move to state p and replace X with the string α on the stack." If α is empty (ε), this effectively pops Xwithout pushing anything new.
Types of PDAs
Deterministic PDAs (DPDAs)
A Deterministic Pushdown Automaton (DPDA) is one where, for any configuration, there is at most one possible transition. This means that for any state, input symbol, and top stack symbol, there is at most one possible next state and stack operation. Additionally, if there's an ε-transition available, there can't be any other transitions for that configuration.
Non-deterministic PDAs (NPDAs)
A Non-deterministic Pushdown Automaton (NPDA) allows multiple possible transitions from a single configuration. From a given state, input symbol, and top stack symbol, the automaton might have multiple options for its next move, including multiple ε-transitions.
Differences in Expressive Power
Unlike finite automata, where deterministic and non-deterministic versions have equivalent power, DPDAs are strictly less powerful than NPDAs. There are context-free languages that can be recognized by NPDAs but not by any DPDA. A classic example is the language of palindromes, strings that read the same forwards and backwards (like racecar).
This means that deterministic and non-deterministic PDAs recognize different classes of languages: DPDAs recognize the deterministic context-free languages (DCFLs), while NPDAs recognize all context-free languages (CFLs).
Theorem: DPDA ⊊ NPDA: Strict Hierarchy
There exist context-free languages that can be recognized by non-deterministic PDAs but not by any deterministic PDA. Specifically, the class of languages recognized by DPDAs is strictly contained in the class of languages recognized by NPDAs.
A canonical example is L = {ww^R | w ∈ {a,b}*}, the language of even-length palindromes. This language is context-free and can be recognized by an NPDA, but cannot be recognized by any DPDA.
Proof: Proof that Palindromes Require Non-determinism
We prove that L = {ww^R | w ∈ {a,b}*} cannot be recognized by any DPDA by contradiction.
Suppose there exists a DPDA M that recognizes L. Consider the strings anbnban and anbnaan for large n.
Both strings have the same prefix anbn. After reading this prefix, M must be in some configuration (q, γ) where q is the state and γ is the stack contents.
Since M is deterministic, from configuration (q, γ), the behavior on inputs ban and aan is completely determined. But anbnban ∉ L while we need M to accept strings of the form anbnbRan = anbn+1an when they are palindromes.
The key insight is that a DPDA cannot "know" where the middle of a palindrome occurs without being able to see the entire remaining input, which violates the finite-state constraint.
Therefore, no DPDA can recognize L, but an NPDA can recognize it by non-deterministically guessing the midpoint.
Example: NPDA for Palindromes
Here's how an NPDA recognizes L = {ww^R | w ∈ {a,b}*}:
Key insight: The ε-transition from q₀ to q₁ represents the non-deterministic guess of when we've reached the middle. A DPDA cannot make this guess without additional information.
Definition: Visibly Pushdown Automaton (VPA)
A Visibly Pushdown Automaton (VPA) is a pushdown automaton where the input alphabet is partitioned into three disjoint sets that determine stack operations:
Σ = Σ_c ∪ Σ_r ∪ Σ_l where Σ_c (call), Σ_r (return), and Σ_l (local) are disjoint
On a ∈ Σ_c: must push exactly one symbol onto the stack
On a ∈ Σ_r: must pop exactly one symbol from the stack (if non-empty)
On a ∈ Σ_l: stack remains unchanged
Formally, M = (Q, Σ_c ∪ Σ_r ∪ Σ_l, Γ, δ, q₀, Z₀, F) where the transition function δ respects the input-driven constraint.
Theorem: VPA Closure Under Boolean Operations
The class of visibly pushdown languages (languages recognized by VPAs) is closed under union, intersection, and complement. This is a remarkable property not shared by general context-free languages.
Furthermore, all of these closure operations can be performed with polynomial increases in the number of states, making VPAs algorithmically tractable for verification applications.
Proof: Proof of VPA Closure Under Intersection
Given VPAs M₁ = (Q₁, Σ, Γ₁, δ₁, q₁⁰, Z₁⁰, F₁) and M₂ = (Q₂, Σ, Γ₂, δ₂, q₂⁰, Z₂⁰, F₂), we construct M = (Q, Σ, Γ, δ, q⁰, Z⁰, F) recognizing L(M₁) ∩ L(M₂).
Construction:
Q = Q₁ × Q₂
Γ = (Γ₁ × Γ₂) ∪ {Z⁰}
q⁰ = (q₁⁰, q₂⁰)
F = F₁ × F₂
Transition function: For a ∈ Σ_c (call symbols):
If δ₁((q₁, a, X₁)) = (p₁, Y₁) and δ₂((q₂, a, X₂)) = (p₂, Y₂), then δ((q₁, q₂), a, (X₁, X₂)) = ((p₁, p₂), (Y₁, Y₂)).
Similar constructions work for return and local symbols. The key insight is that since stack operations are input-driven, both automata perform the same stack operations synchronously, allowing us to track both computations in parallel.
Correctness: A string w is accepted by M if and only if the parallel simulation results in both M₁ and M₂ accepting, which occurs if and only if w ∈ L(M₁) ∩ L(M₂).
Example: XML Tag Matching as VPA
Consider validating XML with tags <tag> and </tag>. This is naturally modeled as a VPA:
Alphabet partition:
Σ_c = {<tag>} (call alphabet - opening tags)
Σ_r = {</tag>} (return alphabet - closing tags)
Σ_l = {text} (local alphabet - content between tags)
VPA behavior:
• On <tag>: must push tag info onto stack
• On </tag>: must pop and verify matching tag
• On text: stack unchanged, just advance
Example valid string:
<tag>text</tag>
VPA: push on <tag>, unchanged on text, pop/match on </tag>
Combining constraints:
If we have VPA₁ for "balanced tags" and VPA₂ for "no nested tags of same type", we can build VPA₃ = VPA₁ ∩ VPA₂ to enforce both constraints simultaneously.
Key advantage: Unlike general CFLs, we can intersect VPA languages and still get a VPA, making complex validation tasks tractable.
Computational Model
Instantaneous Descriptions
To track the execution of a PDA, we use the concept of an instantaneous description (ID), also called a configuration. A configuration captures the complete state of the PDA at a given moment during its execution. It consists of three components:
The current state of the PDA
The remaining unread input
The current contents of the stack (with the topmost symbol on the right)
Formally, a configuration is written as (q, w, γ) where q is the current state, w is the remaining input, and γ is the stack contents.
Acceptance Criteria
There are two common definitions of acceptance for PDAs:
Acceptance by final state: A PDA accepts an input string if, after reading the entire input, it ends in an accepting state. The contents of the stack are irrelevant for this acceptance criterion.
Acceptance by empty stack: A PDA accepts an input string if, after reading the entire input, the stack is empty. The final state is irrelevant for this acceptance criterion.
Theorem: Equivalence of Acceptance Criteria
For every PDA that accepts by final state, there exists a PDA that accepts by empty stack and recognizes the same language, and vice versa. That is, the class of languages accepted by PDAs with final state acceptance is identical to the class of languages accepted by PDAs with empty stack acceptance.
Theorem: DPDA Minimization Complexity
The problem of minimizing a deterministic pushdown automaton is EXPTIME-complete. Given a DPDA M, determining the minimal DPDA recognizing L(M) requires exponential time in the worst case.
This contrasts sharply with DFA minimization, which can be performed in polynomial time using algorithms like Hopcroft's algorithm. The exponential complexity arises from the unbounded stack, which creates exponentially many distinct configurations.
Proof: EXPTIME-hardness of DPDA Minimization
We prove EXPTIME-hardness by reduction from the acceptance problem for exponential-space Turing machines.
Given an exponential-space Turing machine T and input x, we construct a DPDA M such that the minimal DPDA equivalent to M has exponentially fewer states if and only if T accepts x.
Construction idea: The DPDA M simulates T on x using its stack to store the tape contents. If T accepts, then M has many "useless" states that can be merged during minimization. If T rejects, these states remain necessary.
The key insight is that the stack allows the DPDA to have exponentially many distinct configurations, each potentially corresponding to a different equivalence class in the minimal automaton.
Since determining whether T accepts x is EXPTIME-complete, and our reduction runs in polynomial time, DPDA minimization is EXPTIME-hard.
The EXPTIME upper bound follows from the fact that two DPDA states are equivalent if and only if they have the same future behavior on all possible stack contents, which can be checked in exponential time.
Theorem: NPDA to DPDA Conversion Bounds
For any n-state NPDA recognizing a deterministic context-free language, the equivalent minimal DPDA may require up to 22O(n) states.
This double-exponential bound is tight: there exists a family of n-state NPDAs recognizing deterministic context-free languages such that the minimal equivalent DPDA requires Ω(22n) states.
Proof: Double-Exponential Lower Bound Construction
We construct a family of NPDAs that demonstrate the double-exponential lower bound.
For each n, consider the language L_n = {w#w^R | w ∈ {0,1}*, |w| ≤ 2^n} where # is a separator symbol.
NPDA construction: An O(n)-state NPDA can recognize L_n by:
Using n states to count the length of w (in binary)
Non-deterministically guessing when w ends and # begins
Pushing w onto the stack
After seeing #, popping and matching wR
DPDA lower bound: Any DPDA recognizing L_n must be able to distinguish between all possible prefixes of strings in L_n up to the # symbol.
There are 22n possible strings w of length at most 2n, and the DPDA must remember each one to verify the subsequent wR. Since the DPDA cannot guess where # occurs, it must track all possibilities simultaneously.
Using a fooling set argument, we can show that any DPDA requires at least 22n states to recognize L_n.
Example: State Blowup in Practice
Consider this simple NPDA with 4 states that recognizes palindromes of length ≤ 4:
NPDA (4 states):
q₀: Push symbols onto stack
q₁: Guess middle, switch to popping
q₂: Pop and match symbols
q₃: Accept state
Equivalent DPDA needs to track:
All possible prefixes: ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa, aaab, ...
For each prefix, what suffixes would make it a palindrome
This requires states for each possible "commitment" about the first half
State explosion:
NPDA advantages:
Guesses the middle point
Only needs to remember current phase
4 states total
DPDA challenges:
Must track all possible midpoints
Needs separate state for each commitment
Exponentially many states
Key insight: The NPDA's ability to "guess" eliminates the need to track multiple possibilities simultaneously, leading to exponential savings in state complexity.
Theorem: Decidability Landscape for PDAs
The decidability of fundamental problems for PDAs exhibits a complex landscape:
Decidable for NPDAs: Emptiness (PTIME), Finiteness (PTIME), Membership (PTIME)
Undecidable for NPDAs: Equivalence, Inclusion, Universality, Disjointness, Regularity
Undecidable for both: Ambiguity, Inherent ambiguity, Minimality
The key dividing line is determinism: many problems that are decidable for DPDAs become undecidable for NPDAs due to the increased expressiveness of non-deterministic computation.
Proof: DPDA Equivalence is Decidable
We prove that equivalence of deterministic PDAs is decidable by constructing an algorithm that determines whether L(M₁) = L(M₂) for DPDAs M₁ and M₂.
Algorithm outline:
Convert both DPDAs to equivalent CFGs G₁ and G₂ using the standard construction
Use the decidability of DPDA-CFG equivalence: construct a DPDA M' from G₂
Check if L(M₁) = L(M') using the DPDA equivalence algorithm
Core insight: For DPDAs, we can effectively compute a canonical form that allows comparison. The key lemma is:
Lemma: Two DPDA configurations (q₁, γ₁) and (q₂, γ₂) are equivalent if and only if for all strings w, the computations starting from these configurations have the same acceptance behavior.
This equivalence can be checked by constructing a finite automaton that tracks the "reachable configuration differences" between the two DPDAs. Since DPDAs are deterministic, this finite automaton has a bounded number of states.
Complexity: The algorithm runs in exponential time due to the conversion between DPDAs and CFGs, and the need to explore exponentially many configuration pairs.
Correctness: The algorithm terminates because the set of reachable configuration pairs is finite (up to stack equivalence), and it correctly identifies all cases where the DPDAs differ in their acceptance behavior.
Theorem: NPDA Equivalence is Undecidable
The equivalence problem for non-deterministic PDAs is undecidable. That is, there is no algorithm that can determine, for arbitrary NPDAs M₁ and M₂, whether L(M₁) = L(M₂).
This result follows from the undecidability of the Post Correspondence Problem (PCP) and demonstrates a fundamental computational limitation for non-deterministic context-free languages.
Proof: Reduction from Post Correspondence Problem
We prove undecidability by reducing the Post Correspondence Problem (PCP) to NPDA equivalence.
PCP Instance: Given pairs (u₁, v₁), (u₂, v₂), ..., (uₙ, vₙ) where uᵢ, vᵢ ∈ Σ*, does there exist a sequence i₁, i₂, ..., iₖ such that uᵢ₁uᵢ₂...uᵢₖ = vᵢ₁vᵢ₂...vᵢₖ?
Construction: For each PCP instance, we construct NPDAs M₁ and M₂ such that L(M₁) = L(M₂) if and only if the PCP instance has no solution.
NPDA M₁: Accepts all strings of the form w#w where w ∈ Σ* and # is a separator symbol not in Σ.
NPDA M₂: Accepts:
All strings of the form w#w where w ∈ Σ*
All strings of the form uᵢ₁uᵢ₂...uᵢₖ#vᵢ₁vᵢ₂...vᵢₖ for any valid PCP sequence
Key insight: If the PCP has a solution, then M₂ accepts some strings that M₁ does not (specifically, the strings corresponding to PCP solutions where the u-part equals the v-part). If the PCP has no solution, then L(M₁) = L(M₂).
Correctness: The PCP instance has a solution if and only if L(M₁) ≠ L(M₂). Since PCP is undecidable, NPDA equivalence must also be undecidable.
Example: Decidability in Practice
Consider these two simple DPDAs that we want to check for equivalence:
DPDA M₁: Recognizes {a^n b^n | n ≥ 0}
q₀ →ᵃ q₀ (push A for each a)
q₀ →ᵇ q₁ (switch to popping on first b)
q₁ →ᵇ q₁ (pop A for each b)
q₁ →ᵋ q₂ (accept when stack empty)
DPDA M₂: Different construction for same language
p₀ →ᵃ p₁ (first a, push A)
p₁ →ᵃ p₁ (subsequent a's, push more A's)
p₁ →ᵇ p₂ (first b, pop A)
p₂ →ᵇ p₂ (subsequent b's, pop A's)
p₂ →ᵋ p₃ (accept when balanced)
Equivalence checking process:
Algorithm steps:
Convert both to CFGs
Compare derivation capabilities
Check all configuration pairs
Verify identical acceptance
Key insight:
Both DPDAs are deterministic
Can systematically compare behavior
Finite algorithm terminates
Result: L(M₁) = L(M₂)
Contrast with NPDAs:
If these were NPDAs with multiple computation paths, the equivalence problem would become undecidable. The deterministic nature is crucial for the decidability result.
Key insight: Determinism enables effective comparison algorithms, while non-determinism makes equivalence fundamentally uncomputable.
Theorem: DPDA Closure Properties
The class of languages recognized by deterministic PDAs (deterministic context-free languages) has the following closure properties:
Closed under: Complement, intersection with regular languages, inverse homomorphism
Not closed under: Union, intersection, concatenation, Kleene star, homomorphism
The closure under complement is a remarkable property that distinguishes DPDAs from NPDAs and is crucial for many verification applications.
Proof: DPDA Closure Under Complement
Given a DPDA M = (Q, Σ, Γ, δ, q₀, Z₀, F), we construct a DPDA M' = (Q', Σ, Γ', δ', q₀', Z₀', F') such that L(M') = Σ* - L(M).
Key challenge: Unlike DFAs, we cannot simply flip the accepting states because a DPDA might reject a string by getting stuck (no valid transition) rather than reaching a non-accepting state.
Construction:
Completion: Add a sink state q_⊥ and transitions to make M total (never gets stuck)
State doubling: Create Q' = Q × {0, 1} ∪ {q_⊥} where the second component tracks whether we've seen the "end" of input
Endmarker simulation: Use the finite control to detect when input is exhausted
Acceptance flip: Accept in M' if and only if M would reject
Detailed construction:
δ'((q, 0), a, X) = {((p, 0), α)} if δ(q, a, X) = (p, α)
Correctness: The construction ensures that M' accepts exactly when M rejects, handling both explicit rejection (reaching non-accepting state) and implicit rejection (getting stuck).
Theorem: DPDA Non-Closure Under Union
The class of deterministic context-free languages is not closed under union. There exist deterministic context-free languages L₁ and L₂ such that L₁ ∪ L₂ is not deterministic context-free.
A canonical counterexample is L₁ = {a^n b^n c^m | n, m ≥ 0} and L₂ = {a^n b^m c^m | n, m ≥ 0}. Both are deterministic context-free, but L₁ ∪ L₂ requires non-deterministic choice.
Proof: Proof of Non-Closure Under Union
We prove that L₁ = {a^n b^n c^m | n, m ≥ 0} and L₂ = {a^n b^m c^m | n, m ≥ 0} are both deterministic context-free, but L₁ ∪ L₂ is not.
Step 1:L₁ is deterministic context-free.
DPDA for L₁: Push each a, pop for each b until balanced, then freely read any number of c's.
Step 2:L₂ is deterministic context-free.
DPDA for L₂: Freely read any number of a's, then push each b, pop for each c until balanced.
Step 3:L₁ ∪ L₂ is not deterministic context-free.
Suppose, for contradiction, that there exists a DPDA M recognizing L₁ ∪ L₂. Consider the string akbkck for large k.
This string is in both L₁ and L₂. However, after reading the prefix ak, the DPDA M must decide deterministically how to process the remaining bkck.
Dilemma: To accept strings in L₁, M should start matching a's with b's immediately. To accept strings in L₂, M should ignore the a's and start matching b's with c's.
Since M is deterministic, it cannot make both choices simultaneously. A formal proof shows that any DPDA attempting to recognize L₁ ∪ L₂ will fail on some inputs.
Example: Closure Properties in Action
Let's see how closure properties work with concrete DPDA languages:
Base language:L = {a^n b^n | n ≥ 0} (DPDA-recognizable)
Simple balanced parentheses-like language, easily recognized by a DPDA
Complement operation (✓ Closed):
L^c = Σ* - L = all strings that aren't perfectly balanced
Result: Still DPDA-recognizable using complement construction
Examples in L^c: a, ab, aab, abbb, bab
Union attempt (✗ Not closed):
L₁ = {a^n b^n c^m | n,m ≥ 0} (DPDA)
L₂ = {a^n b^m c^m | n,m ≥ 0} (DPDA)
L₁ ∪ L₂ = Not DPDA-recognizable!
Why union fails:
String aabbcc:
In L₁: a²b²c² ✓
In L₂: a²b²c² ✓
Both patterns match!
DPDA dilemma:
Must choose strategy early
Can't pursue both simultaneously
Determinism breaks down
Intersection with regular (✓ Closed):
L ∩ R where R = {strings with even length}
Result: {a^n b^n | n ≥ 0, 2n is even} = {a^n b^n | n ≥ 0}
Still DPDA-recognizable (actually same language in this case)
Key insight: DPDA closure properties are much more restrictive than general CFL closure properties, reflecting the limitations of deterministic computation.
Relationship to Context-Free Languages
Theorem: Equivalence of PDAs and CFGs
A language L is context-free if and only if there exists a pushdown automaton that accepts L. That is, the class of languages accepted by pushdown automata is precisely the class of context-free languages.
Proof: CFG to PDA Conversion
Given a context-free grammar G = (V, Σ, R, S), we construct a PDA M = (Q, Σ, Γ, δ, q₀, Z₀, F) that accepts L(G) as follows:
Γ = V ∪ Σ ∪ {Z₀} (stack symbols include variables, terminals, and stack bottom)
q₀ is the start state
Z₀ is the initial stack symbol
F = {q₂} (one accept state)
The transition function δ is defined as follows:
δ(q₀, ε, Z₀) = {(q₁, SZ₀)} (initialize by pushing start symbol onto stack)
For each variable A ∈ V and each production A → α in R:
δ(q₁, ε, A) = {(q₁, α)} (replace A with α on stack)
For each terminal a ∈ Σ:
δ(q₁, a, a) = {(q₁, ε)} (match and consume terminal symbols)
δ(q₁, ε, Z₀) = {(q₂, Z₀)} (move to accept state when done)
To prove that L(M) = L(G), we show:
If w ∈ L(G), then w ∈ L(M): For any string w derivable in G, there exists a leftmost derivation S ⇒* w. The PDA M simulates this derivation by first pushing S onto the stack, then repeatedly replacing variables with their production right-hand sides (non-deterministically choosing which production to apply), and matching terminals in the input with terminals on top of the stack.
If w ∈ L(M), then w ∈ L(G): We can show by induction on the length of computation that if M accepts w, then there must be a derivation S ⇒* w in G. The sequence of variable replacements made by M corresponds directly to a leftmost derivation in G.
Therefore, the PDA M accepts exactly the language generated by the grammar G.
Proof: PDA to CFG Conversion
Given a PDA P = (Q, Σ, Γ, δ, q₀, Z₀, F), we construct a CFG G = (V, Σ, R, S) that generates L(P) using the triple construction method:
V consists of:
A start symbol S
Variables of the form [q, A, p] for all q, p ∈ Q and A ∈ Γ
R contains the following productions:
For each accept state q_f ∈ F:
S → [q₀, Z₀, q_f]
For each transition δ(q, a, A) that includes (p, ε) (pop without pushing):
[q, A, p] → a if a ∈ Σ, or [q, A, p] → ε if a = ε
For each transition δ(q, a, A) that includes (p, B) (replace top symbol):
For all r ∈ Q: [q, A, r] → a [p, B, r] if a ∈ Σ, or [q, A, r] → [p, B, r] if a = ε
For each transition δ(q, a, A) that includes (p, BC) (push two symbols):
For all r, s ∈ Q: [q, A, s] → a [p, B, r] [r, C, s] if a ∈ Σ, or [q, A, s] → [p, B, r] [r, C, s] if a = ε
For longer stack pushes, we can extend the above pattern, or decompose them into multiple transitions that push at most two symbols at a time.
The variable [q, A, p] represents the following: "starting in state q with A on top of the stack, the PDA can consume some input string and end in state p withA removed from the stack (and nothing added back)."
To prove that L(G) = L(P), we show:
If w ∈ L(P), then w ∈ L(G): If PDA P accepts string w, there must be an accepting computation starting from the initial configuration(q₀, w, Z₀) and ending at (q_f, ε, γ) for some q_f ∈ F and stack contents γ. We can show by induction on the length of the computation that each subcomputation can be represented by a derivation of the corresponding variable [q, A, p] in G.
If w ∈ L(G), then w ∈ L(P): If w can be derived from S in G, then there must be a derivation S ⇒* w that starts with S → [q₀, Z₀, q_f]for some q_f ∈ F. We can show by induction on the length of the derivation that if [q, A, p] ⇒* v in G, then P can go from configuration (q, v, Aα)to (p, ε, α) for any stack suffix α.
Therefore, the CFG G generates exactly the language accepted by the PDA P.
Converting CFG to PDA
Given a context-free grammar, we can construct a pushdown automaton that recognizes the same language. The construction follows a simple idea: use the stack to keep track of the grammar symbols we expect to see, and match terminals in the input against terminals on the stack.
Algorithm
Create a PDA with three states: q₀ (start), q₁ (working), and q₂ (accept)
Initialize the stack with the start symbol of the grammar on top of the stack's bottom marker Z₀
For each terminal a in the grammar, add a transition from q₁ to q₁ that consumes input a when a is on top of the stack and pops a
For each production A → α in the grammar, add an ε-transition from q₁ to q₁that replaces A on top of the stack with α (reversed, since stacks work LIFO)
Add an ε-transition from q₀ to q₁ that pushes the start symbol on top of Z₀
Add an ε-transition from q₁ to q₂ when Z₀ is on top of the stack, leaving Z₀ on the stack
Example Conversion
Consider the context-free grammar for the language {aⁿ bⁿ | n ≥ 0}:
S → aSb | ε
We construct a PDA with:
States: q₀ (start), q₁ (working), q₂ (accept)
Input alphabet: {a, b}
Stack alphabet: {Z₀, S, a, b}
Initial stack symbol: Z₀
Transitions:
(q₀, ε, Z₀) → (q₁, SZ₀) - Initialize by pushing S on top of Z₀
(q₁, ε, S) → (q₁, aSb) - Replace S with aSb (production S → aSb)
(q₁, ε, S) → (q₁, ε) - Replace S with ε (production S → ε)
(q₁, a, a) → (q₁, ε) - Consume a from input and pop a from stack
(q₁, b, b) → (q₁, ε) - Consume b from input and pop b from stack
(q₁, ε, Z₀) → (q₂, Z₀) - Move to accept state when Z₀ is on top
This PDA will simulate leftmost derivations in the grammar, accepting strings that can be derived from the start symbol.
Converting PDA to CFG
Converting a PDA to a CFG is more complex than the reverse. The key idea is to create variables that represent how the PDA can process portions of the input while manipulating its stack.
Triple Construction Method
The method creates variables of the form [p,A,q] for states p, q and stack symbol A, representing "starting in state p with A on top of the stack, the PDA can end in state q with A removed from the stack."
Algorithm
Create a variable S as the start symbol of the CFG
For each pair of states p, q and each stack symbol A, create a variable [p,A,q]
For each accept state q, add a production S → [q₀,Z₀,q], where q₀ is the start state and Z₀ is the initial stack symbol
For each transition (p,a,A) → (q,ε) (i.e., that pops A without pushing anything), add a production [p,A,q] → a
For each transition (p,a,A) → (q,B) (i.e., that replaces A with B), and for each state r, add a production [p,A,r] → a[q,B,r]
For each transition (p,a,A) → (q,BC) (i.e., that replaces A with BC), and for all states r, s, add a production [p,A,s] → a[q,B,r][r,C,s]
Add productions that allow for multi-step derivations: For all states p, q, r and stack symbols A, B:[p,A,r] → [p,A,q][q,B,r]
This construction creates a grammar where each variable [p,A,q] generates all strings that allow the PDA to go from state p to state q while removing A from the top of the stack (possibly after pushing and popping other symbols).
The construction can be complex for PDAs with many states and transitions. The resulting grammar often has many variables and productions, and may benefit from simplification.
Limitations of PDAs
Despite being more powerful than finite automata, pushdown automata still have limitations. There are many languages that PDAs cannot recognize.
Languages PDAs Cannot Recognize
A classic example of a language that PDAs cannot recognize is:
{aⁿ bⁿ cⁿ | n ≥ 1}
This language consists of strings with equal numbers of a, b, and c, all grouped together. While a PDA can match the count of a with b (using its stack), it cannot simultaneously match the count of b with c, because once it has popped symbols to match b with a, that information is lost.
The Pumping Lemma for CFLs
Just as there is a pumping lemma for regular languages, there is a pumping lemma for context-free languages (CFLs). This lemma provides a necessary condition for a language to be context-free.
Lemma: Pumping Lemma for Context-Free Languages
If L is a context-free language, then there exists a constant p ≥ 1 (the pumping length) such that any string s in L with |s| ≥ p can be divided into five parts s = uvxyz such that: 1. |vy| ≥ 1 (v and y are not both empty) 2. |vxy| ≤ p (the middle portion is not too long) 3. For all i ≥ 0, the string uvⁱxyⁱz is in L (v and y can be "pumped" any number of times)
Lemma: Pumping Lemma Application
We can use the pumping lemma to prove that {aⁿ bⁿ cⁿ | n ≥ 1} is not context-free:
Assume L = {aⁿ bⁿ cⁿ | n ≥ 1} is context-free
Let p be the pumping length given by the pumping lemma
Consider the string s = aᵖ bᵖ cᵖ, which is in L
By the pumping lemma, s can be written as uvxyz with the properties above
Since |vxy| ≤ p, these substrings must be within the first 2p characters
This means vxy consists only of a's, only of b's, or some a's and some b's
In any case, pumping v and y will create a string where the numbers of a's, b's, and c's are not equal
This contradicts the assumption that L is context-free
This proof technique is powerful for showing that certain languages are beyond the capabilities of PDAs and context-free grammars.
Applications of PDAs
Parsing Programming Languages
The most significant application of pushdown automata is in parsing programming languages. Most programming languages have syntax that can be described by context-free grammars, and PDAs provide the theoretical foundation for parsing these languages.
Parsers for programming languages often implement algorithms based on pushdown automata, such as LL parsers and LR parsers. These parsers read the source code and build a parse tree or abstract syntax tree, which is then used for compilation or interpretation.
Recognizing Nested Structures
PDAs are particularly good at recognizing nested structures, such as:
Balanced parentheses, brackets, and braces in programming languages
Nested tags in markup languages like HTML and XML
Block structure in languages with begin/end or opening/closing delimiters
The stack in a PDA naturally matches these nested structures, pushing when an opening delimiter is encountered and popping when a closing delimiter is matched.
Implementation in Compilers
In compiler design, pushdown automata concepts are used in several phases:
Lexical analysis: While this phase typically uses finite automata, some complex token structures might require the power of a PDA
Syntax analysis (parsing): This phase often uses algorithms based on PDAs to check if the program conforms to the grammar of the language
Type checking: Some aspects of type checking, especially in languages with complex type systems, may use PDA-like mechanisms
Understanding PDAs provides insights into the theoretical foundations of compiler construction and the limitations of what can be parsed efficiently.
Lemma: Interchange Lemma for Context-Free Languages
Let L be a context-free language. Then there exist constants c, d ≥ 1 such that for any string z ∈ L with |z| ≥ c, if z contains two disjoint substrings u and v each of length at least d, then there exist strings u' and v' of the same respective lengths such that the string obtained by replacing u with u' and v with v' is also in L.
This lemma is more powerful than the standard pumping lemma because it allows "interchange" of disjoint substrings rather than just repetition of single substrings.
Proof: Proof of Interchange Lemma
Let G be a CFG for L in Chomsky Normal Form with k variables. Set c = 22k and d = 2k.
Consider any string z ∈ L with |z| ≥ c and disjoint substrings u, v each of length at least d.
Parse tree analysis: Let T be a parse tree for z. Since |z| ≥ 22k, the tree T has height at least 2k.
Subtree identification: Let Tu and Tv be the minimal subtrees of T that contain the substrings u and v respectively. Since u and v are disjoint, Tu and Tv are also disjoint.
Variable repetition: Since each of Tu and Tv has yield of length at least 2k, each has height at least k. By the pigeonhole principle, there exist variables A and B that appear at least twice on some root-to-leaf paths in Tu and Tv respectively.
Interchange construction: The repeated variables allow us to "extract" the derivation patterns for generating substrings of the same length as u and v. By swapping these derivation patterns between the subtrees, we obtain strings u' and v' such that replacing u with u' and v with v' in z yields another string in L.
Formal interchange: If A ⇒* u1Au2 and A ⇒* u3 in Tu, and B ⇒* v1Bv2 and B ⇒* v3 in Tv, then we can construct derivations where the roles of these patterns are interchanged.
Lemma: Ogden's Lemma for Context-Free Languages
Let L be a context-free language. Then there exists a constant p ≥ 1 such that for any string z ∈ L with at least p marked positions, z can be written as z = uvwxy where:
vwx has at most p marked positions
vx has at least one marked position
For all i ≥ 0, uviwxiy ∈ L
The power of Ogden's lemma lies in the ability to selectively mark positions in the string, forcing the "pumpable" parts to involve the marked positions.
Proof: Application: Using Ogden's Lemma
We use Ogden's lemma to prove that L = {a^i b^j c^k d^l | i = 0 or j ≠ k or k ≠ l} is not context-free.
Strategy: We'll show that the complement Lc = {a^i b^j c^k d^l | i > 0 and j = k and k = l} is not context-free, which implies L is not context-free (since CFLs are not closed under complement).
Actually, let's directly prove L' = {a^i b^j c^j d^j | i, j ≥ 1} is not context-free.
Proof: Assume L' is context-free with pumping constant p. Consider the string s = apbpcpdp.
Strategic marking: Mark all positions in the bpcpdp portion, leaving the a's unmarked. This gives us 3p marked positions.
By Ogden's lemma, s = uvwxy where vwx contains at most p marked positions and vx contains at least one marked position.
Case analysis: Since vwx has at most p marked positions and the marked region spans 3p positions across three different symbol types, vwx cannot span all three symbol types {b, c, d}.
Therefore, vx affects at most two of the three counts {|s|_b, |s|_c, |s|_d}. Pumping with i = 2 will increase at most two of these counts while leaving the third unchanged, violating the constraint that all three must be equal.
This contradicts the assumption that L' is context-free.
Example: Comparing Pumping Tools
Let's compare the power of different tools for proving non-context-freeness:
Language:L = {a^n b^n c^n | n ≥ 0}
Classic example of a non-context-free language
Standard Pumping Lemma:
Approach: Take string apbpcp
Issue: The vwx region spans at most 2 symbol types
Approach: Mark strategic positions to force specific behavior
Advantage: More control over where pumping occurs
Result: ✓ Also works, with more precision
Interchange Lemma:
Approach: Consider swapping disjoint substrings
Power: Can handle cases where pumping lemma fails
Result: ✓ Most powerful tool
When standard pumping lemma fails:
Complex language:
{w#w#w | w ∈ {a,b}*}
Three copies of same string
Solution needed:
Interchange lemma works
Can swap parts between copies
Standard pumping fails
Key insight: Each tool has its strengths. Ogden's lemma provides control through marking, while the interchange lemma handles complex structural constraints that defeat simpler approaches.
Summary
Pushdown Automaton (PDA): A 7-tuple (Q, Σ, Γ, δ, q₀, Z₀, F) that extends finite automata with a stack memory
Stack: A LIFO data structure that provides memory beyond what finite automata offer
Transition Function:δ: Q × (Σ ∪ {ε}) × Γ → P(Q × Γ*) maps state-input-stack combinations to new states and stack operations
Acceptance Criteria: A string can be accepted either by final state or by empty stack, and these methods are equivalent in power
Deterministic vs. Non-deterministic: DPDAs are strictly less powerful than NPDAs (unlike with finite automata)
Equivalence with CFGs: PDAs and context-free grammars describe exactly the same class of languages
Conversion Algorithms: We can convert CFGs to PDAs (by simulating leftmost derivations) and PDAs to CFGs (using the triple construction)
Limitations: PDAs cannot recognize languages like {aⁿ bⁿ cⁿ | n ≥ 1} that require more than one counter
Pumping Lemma for CFLs: A necessary condition for a language to be context-free, useful for proving certain languages are not context-free
Applications: Parsing programming languages, recognizing nested structures, and compiler construction
Suggested Reading
Sipser, Introduction to the Theory of Computation – Chapter 2.2: Pushdown Automata
Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation – Chapter 6: Pushdown Automata
Kozen, Automata and Computability – Lecture 23: Pushdown Automata
Kozen, Automata and Computability – Lecture 24: PDAs and CFGs
Kozen, Automata and Computability – Lecture 25: Simulating NPDAs by CFGs