If L₁
and L₂
are regular languages, then their unionL₁ ∪ L₂ = {w | w ∈ L₁ or w ∈ L₂}
is also a regular language.
Properties of Regular Languages
Having explored finite automata and regular expressions, we now turn to the fundamental properties that characterize regular languages. These properties not only deepen our understanding of regular languages but also provide powerful tools for proving whether a language is regular or not, and for constructing new regular languages from existing ones.
Closure Properties of Regular Languages
Closure Properties Overview
A language class is closed under an operation if applying that operation to language(s) in the class always results in another language in the same class. Regular languages are closed under all the following operations:
Operation | Regular Languages | Definition |
---|---|---|
Union | ✓ | L₁ ∪ L₂ = {w | w ∈ L₁ or w ∈ L₂} |
Intersection | ✓ | L₁ ∩ L₂ = {w | w ∈ L₁ and w ∈ L₂} |
Complement | ✓ | L̄ = Σ* - L = {w ∈ Σ* | w ∉ L} |
Concatenation | ✓ | L₁·L₂ = {w | w = xy, where x ∈ L₁ and y ∈ L₂} |
Kleene Star | ✓ | L* = {w | w = w₁w₂...wₖ, where each wᵢ ∈ L and k ≥ 0} |
Reversal | ✓ | Lᴿ = {wᴿ | w ∈ L} where wᴿ is the reverse of w |
Regular languages possess important closure properties, meaning that when we apply certain operations to regular languages, the result is always another regular language. These properties are fundamental both for theoretical understanding and for practical applications in language processing.
Union
Theorem: Closure Under Union
Proof: Proof of Closure Under Union
Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁)
andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂)
be DFAs that recognize L₁
and L₂
respectively.
We construct a new NFA M = (Q, Σ, δ, q₀, F)
that recognizes L₁ ∪ L₂
:
Q = {q₀} ∪ Q₁ ∪ Q₂
(whereq₀
is a new state)δ(q₀, ε) = {q₀₁, q₀₂}
- For all
q ∈ Q₁
,a ∈ Σ
:δ(q, a) = δ₁(q, a)
- For all
q ∈ Q₂
,a ∈ Σ
:δ(q, a) = δ₂(q, a)
F = F₁ ∪ F₂
This NFA accepts any string that is accepted by either M₁
or M₂
. Since NFAs recognize exactly the regular languages, L₁ ∪ L₂
is regular.
Example: Union Construction in Action
Let's construct the union of L1 = {strings ending with
and a
}L2 = {strings with even length}
Original DFAs:
DFA for L1
:
q0
: start, not ending witha
q1
: accept, ending witha
- Transitions:
δ1(q0, a) = q1
δ1(q0, b) = q0
, etc.
DFA for L2
:
p0
: start/accept, even lengthp1
: odd length- Transitions:
δ2(p0, a) = p1
δ2(p1, a) = p0
, etc.
Union NFA Construction:
- Create new start state
qnew
- Add ε-transitions:
δ(qnew, ε) = {q₀, p₀}
- Include all original states and transitions
- Accept states:
F = {q₁, p₀}
Test Cases:
a
: ✓ ends with a
In
L1
, so in unionbb
: ✓ even lengthIn
L2
, so in unionba
: ✓ both conditionsIn both languages
b
: ✗ neither conditionNot in union
Key insight: The union automaton accepts if either original automaton would accept, achieved by branching to both start states with ε-transitions.
Intersection
Theorem: Closure Under Intersection
If L₁
and L₂
are regular languages, then their intersectionL₁ ∩ L₂ = {w | w ∈ L₁ and w ∈ L₂}
is also a regular language.
Proof: Proof of Closure Under Intersection
Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁)
andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂)
be DFAs that recognize L₁
and L₂
respectively.
We construct a new DFA M = (Q, Σ, δ, q₀, F)
that recognizes L₁ ∩ L₂
:
Q = Q₁ × Q₂
(the Cartesian product of states)q₀ = (q₀₁, q₀₂)
δ((q₁, q₂), a) = (δ₁(q₁, a), δ₂(q₂, a))
F = F₁ × F₂ = {(q₁, q₂) | q₁ ∈ F₁ and q₂ ∈ F₂}
This DFA simulates M₁
and M₂
in parallel, and accepts a string if and only if both original DFAs accept it. Since DFAs recognize exactly the regular languages, L₁ ∩ L₂
is regular.
Example: Product Construction for Intersection
Let's intersect L1 = {strings containing
and ab
}L2 = {strings with even number of
a
}
Product DFA Construction:
States: Q = Q1 × Q2
(q0, p0)
Start: no
ab
, even a
(q0, p1)
No
ab
, odd a
(q1, p0)
Saw
a
, even a
(q1, p1)
Saw
a
, odd a
(q2, p0)
Found
ab
, even a
✓(q2, p1)
Found
ab
, odd a
Sample Transitions:
δ((q0, p0), a) = (q1, p1)
— sawa
, now odd countδ((q1, p1), b) = (q2, p1)
— completedab
, still oddδ((q2, p1), a) = (q1, p0)
— anothera
, now even
Accepting States: Only (q2, p0)
— must satisfy both conditions
Test Cases:
ab
: ✓Contains
ab
, 1 a
(odd) → wait, should be even!ab
: ✗Actually: odd number of
a
abaa
: ✓Contains
ab
, 3 a
total... odd againaabaa
: ✓Contains
ab
, 4 a
(even)Key insight: The product construction tracks both machines simultaneously, accepting only when both original conditions are satisfied.
Complement
Theorem: Closure Under Complement
If L
is a regular language over an alphabet Σ
, then its complementL̄ = Σ* - L = {w ∈ Σ* | w ∉ L}
is also a regular language.
Proof: Proof of Closure Under Complement
Let M = (Q, Σ, δ, q₀, F)
be a DFA that recognizes L
.
We construct a new DFA M' = (Q, Σ, δ, q₀, Q-F)
that recognizes L̄
.
The only difference is in the set of accepting states: M'
accepts exactly when M
rejects, and vice versa. Since DFAs recognize exactly the regular languages, L̄
is regular.
Example: Complement by State Flipping
Let's find the complement of L = {strings containing
aa
}
Original DFA for L
:
q0
: StartHaven't seen any
a
q1
Just saw one
a
q2
: AcceptFound
aa
✓Transitions:
δ(q0, a) = q1
,δ(q0, b) = q0
δ(q1, a) = q2
,δ(q1, b) = q0
δ(q2, a) = q2
,δ(q2, b) = q2
Complement DFA for L̄
:
q0
: AcceptNo
aa
yet ✓q1
: AcceptStill no
aa
✓q2
: RejectFound
aa
✗Same transitions, but F' = {q₀, q₁}
instead of F = {q₂}
Language Comparison:
Original L
:
aa
✓baab
✓aabb
✓ab
✗ba
✗aba
✗Complement L̄
:
aa
✗baab
✗aabb
✗ab
✓ba
✓aba
✓Key insight: Complementation is trivial for DFAs — just flip which states are accepting. Every string goes to exactly one state, so we accept exactly when the original automaton rejects.
Concatenation
Theorem: Closure Under Concatenation
If L₁
and L₂
are regular languages, then their concatenationL₁·L₂ = {w | w = xy, where x ∈ L₁ and y ∈ L₂}
is also a regular language.
Proof: Proof of Closure Under Concatenation
Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁)
andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂)
be NFAs recognizing L₁
and L₂
respectively.
We construct a new NFA M = (Q, Σ, δ, q₀, F)
for L₁·L₂
:
Q = Q₁ ∪ Q₂
q₀ = q₀₁
- For all
q ∈ Q₁
,a ∈ Σ
:δ(q, a) = δ₁(q, a)
- For all
q ∈ F₁
:δ(q, ε) = {q₀₂} ∪ δ₁(q, ε)
- For all
q ∈ Q₂
,a ∈ Σ
:δ(q, a) = δ₂(q, a)
F = F₂
This NFA simulates M₁
until it reaches an accepting state, at which point it can either continue in M₁
or transition to M₂
's start state. The NFA accepts if it finishes in an accepting state of M₂
. Since NFAs recognize exactly the regular languages, L₁·L₂
is regular.
Example: NFA Construction for Concatenation
Let's concatenate L1 = {a, aa}
and L2 = {b, bb}
Original NFAs:
NFA for L1
:
q0
: startq1
: accept, after readinga
q2
: accept, after readingaa
Transitions:
δ1(q0, a) = q1
δ1(q1, a) = q2
NFA for L2
:
p0
: startp1
: accept, after readingb
p2
: accept, after readingbb
Transitions:
δ2(p0, b) = p1
δ2(p1, b) = p2
Concatenation NFA Construction:
- Keep all states:
Q = {q₀, q₁, q₂, p₀, p₁, p₂}
- Start state:
q0
(fromL1
) - Add ε-transitions from accept states of
L1
to start ofL2
:δ(q1, ε) = {p₀}
— aftera
, can startL2
δ(q2, ε) = {p₀}
— afteraa
, can startL2
- Accept states:
F = {p₁, p₂}
(only fromL2
) - Make original
L1
accept states non-accepting:q1, q2 ∉ F
Resulting Language L1 · L2
:
ab
a + b
abb
a + bb
aab
aa + b
aabb
aa + bb
Trace: How aab
is accepted:
- Start at state
q0
- Read
a
: transition to stateq1
- Read
a
: transition to stateq2
(completedaa
fromL1
) - Take ε-transition:
δ(q2, ε) = p0
(jump to start ofL2
) - Read
b
: transition to statep1
(accept state) ✓
Key insight: ε-transitions connect the accepting states of
L1
to the start state of L2
, allowing the NFA to "jump" between automata precisely when the first part is complete.Kleene Star
Theorem: Closure Under Kleene Star
If L
is a regular language, then its Kleene starL* = {w | w = w₁w₂...wₖ, where each wᵢ ∈ L and k ≥ 0}
is also a regular language.
Proof: Proof of Closure Under Kleene Star
Let M = (Q, Σ, δ, q₀, F)
be an NFA recognizing L
.
We construct a new NFA M' = (Q', Σ, δ', q₀', F')
for L*
:
Q' = Q ∪ {q₀'}
q₀'
is a new start stateδ'(q₀', ε) = {q₀}
(ε-transition to original start state)- For all
q ∈ Q
,a ∈ Σ
:δ'(q, a) = δ(q, a)
- For all
qf ∈ F
:δ'(qf, ε) = {q₀}
(ε-transition from accepting states back to start) F' = F ∪ {q₀'}
(new start state is also accepting)
This NFA can match the empty string (by accepting immediately from q₀'
), single iterations of L
(by following the original NFA), or multiple iterations (by going back to the start state after each accepted string). Since NFAs recognize exactly the regular languages, L*
is regular.
Example: Kleene Star Construction
Let's construct the Kleene star of L =
{ab, ba}
Original NFA for L
:
States: q0
(start), q1
(after a
), q2
(after b
), q3
(accept ab
), q4
(accept ba
)
Transitions:
δ(q0, a) = q1
δ(q0, b) = q2
δ(q1, b) = q3
(completesab
)δ(q2, a) = q4
(completesba
)
Accept states: F = {q₃, q₄}
Kleene Star NFA Construction:
- Add new start state
qnew
- Make
qnew
an accept state (handlesε ∈ L*
) - Add ε-transition:
δ(qnew, ε) = {q₀}
(can start original machine) - Add ε-transitions from accept states back to original start:
δ(q3, ε) = {q₀}
— afterab
, can repeatδ(q4, ε) = {q₀}
— afterba
, can repeat
- Final accept states:
F' = {qnew, q3, q4}
Resulting Language L*
:
ε
0 repetitions
ab
1 repetition
ba
1 repetition
abab
2 repetitions
abba
2 repetitions
baba
2 repetitions
And infinitely many more combinations...
Trace: How abba
is accepted:
- Start at new state
qnew
- Take ε-transition:
δ(qnew, ε) = q0
- Read
ab
: reach stateq3
(firstab
complete) - Take ε-transition:
δ(q3, ε) = q0
(loop back for repetition) - Read
ba
: reach stateq4
(second stringba
complete) ✓
Key insight: The Kleene star construction creates loops via ε-transitions that allow any number of repetitions, including zero (handled by making the new start state accepting). The ε-transitions provide seamless "glue" between repetitions.
Reversal
Theorem: Closure Under Reversal
If L
is a regular language over alphabet Σ
, then its reversalLᴿ = {wᴿ | w ∈ L}
is also a regular language, where wᴿ
denotes the reversal of string w
.
Proof: Proof of Closure Under Reversal
Let M = (Q, Σ, δ, q₀, F)
be an NFA recognizing L
.
We construct a new NFA M' = (Q, Σ, δ', F', {q₀})
for Lᴿ
as follows:
- Use the same set of states
Q
- Make the original start state
q₀
the only accepting state inM'
- Make the original accepting states
F
the set of start states inM'
(we can simulate this with a new start state and ε-transitions) - Reverse all transitions: for each transition
δ(q, a) = p
inM
, add a transitionδ'(p, a) = q
inM'
This NFA processes strings in reverse, starting from what were the accepting states and accepting if it reaches what was the start state. Since NFAs recognize exactly the regular languages, Lᴿ
is regular.
Example: Language Reversal Construction
Let's reverse L = {strings starting with
a
and ending with b
}
Original NFA for L
:
States: q0
(start), q1
(after a
), q2
(accept, after b
)
Transitions:
δ(q0, a) = q1
— must start witha
δ(q1, a) = q1
— can see morea
δ(q1, b) = q1
— can seeb
in middleδ(q1, b) = q2
— transition to accept onb
δ(q2, a) = q2
— stay accepting ona
δ(q2, b) = q2
— stay accepting onb
Accept states: F = {q₂}
Reversal Construction Steps:
- Reverse all transitions:
δ'(q1, a) = q0
(wasδ(q0, a) = q1
)δ'(q1, a) = q1
(wasδ(q1, a) = q1
, self-loop preserved)δ'(q1, b) = q1
(wasδ(q1, b) = q1
, self-loop preserved)δ'(q2, b) = q1
(wasδ(q1, b) = q2
)δ'(q2, a) = q2
(wasδ(q2, a) = q2
, self-loop preserved)δ'(q2, b) = q2
(wasδ(q2, b) = q2
, self-loop preserved)
- Swap start and accept states:
- New start states:
{q₂}
(old accept states) - New accept states:
{q₀}
(old start state)
- New start states:
Reversed NFA for LR
:
Now recognizes strings starting with b
and ending with a
Start state: q2
, Accept state: q0
The flow is now: q2
(start) → q1
(on b
) → q0
(accept, on a
)
Language Transformation:
Original L
:
ab
✓aabb
✓abab
✓ba
✗bb
✗Pattern: a
...b
Reversed LR
:
ba
✓bbaa
✓baba
✓ab
✗aa
✗Pattern: b
...a
Trace: How baba
is accepted in LR
:
- Start at state
q2
(new start state) - Read
b
: transition toq1
(viaδ'(q2, b) = q1
) - Read
a
: transition toq1
(self-loopδ'(q1, a) = q1
) - Read
b
: stay atq1
(self-loopδ'(q1, b) = q1
) - Read
a
: transition toq0
(accept state) ✓
Key insight: Reversal flips the automaton's "temporal direction" — the machine now processes strings as if reading the original strings backwards. By reversing transitions and swapping start/accept roles, we achieve exactly this effect.
The Pumping Lemma for Regular Languages
While the closure properties show us how to build new regular languages, the Pumping Lemma gives us a powerful tool for proving that certain languages are not regular. It exploits a fundamental limitation of finite automata: their inability to "count" or "remember" an arbitrary amount of information.
Lemma: Pumping Lemma for Regular Languages
If L
is a regular language, then there exists a constant p ≥ 1
(the "pumping length") such that for every string w ∈ L
with |w| ≥ p
, w
can be decomposed into three partsw = xyz
, satisfying:
|y| > 0
(y
is not empty)|xy| ≤ p
(x
andy
are within the firstp
characters)- For all
i ≥ 0
,xyᶦz ∈ L
(pumpingy
any number of times keeps the string in the language)
Proof: Proof of the Pumping Lemma
Let L
be a regular language, and let M = (Q, Σ, δ, q₀, F)
be a DFA that recognizes L
. Set p = |Q|
(the number of states in M
).
Consider any string w ∈ L
with |w| ≥ p
. When M
processes w
, it goes through a sequence of |w|+1
states: r₀, r₁, ..., r|w|
, where r₀ = q₀
and r|w| ∈ F
.
Since |w| ≥ p = |Q|
, by the Pigeonhole Principle, there must be at least one state that appears more than once in the first p+1
states of this sequence. Let rᵢ = rⱼ
for some 0 ≤ i < j ≤ p
.
We can decompose w
as follows:x =
first i
symbols of w
(which take M
from r₀
to rᵢ
)y =
symbols from positions i+1
to j
(which take M
from rᵢ
back to rᵢ = rⱼ
)z =
remaining symbols (which take M
from rⱼ
to an accepting state)
This decomposition satisfies all three conditions:
1. |y| = j-i > 0
(since i < j
)
2. |xy| = j ≤ p
3. For any i ≥ 0
, xyᶦz ∈ L
, because:
- x
takes M
from r₀
to rᵢ
- Each repetition of y
takes M
from rᵢ
back to rᵢ
- z
takes M
from rᵢ
to an accepting state
Using the Pumping Lemma to Prove Non-Regularity
The Pumping Lemma is typically used to prove that a language is not regular through a proof by contradiction:
- Assume that language
L
is regular - Then the Pumping Lemma applies to
L
with some pumping lengthp
- Find a string
w ∈ L
with|w| ≥ p
- Show that for any decomposition
w = xyz
satisfying conditions 1 and 2, there exists somei ≥ 0
such thatxyᶦz ∉ L
- This contradicts condition 3 of the Pumping Lemma
- Therefore,
L
cannot be regular
Example: Proving {aⁿ bⁿ | n ≥ 0}
is Not Regular
Let's prove that the language L = {aⁿ bⁿ | n ≥ 0}
is not regular.
Step 1: Assume, for contradiction, that L
is regular.
Step 2: By the Pumping Lemma, there exists a pumping length p ≥ 1
.
Step 3: Consider the string w = aᵖ bᵖ ∈ L
.
Step 4: Any decomposition w = xyz
with |y| > 0
and |xy| ≤ p
must have y = aᵏ
for some k > 0
, since the first p
characters of w
are all a
.
Step 5: Consider xy²z
. This string has p+k
a
and p
b
. Since k > 0
, the number of a
exceeds the number of b
, so xy²z ∉ L
.
Step 6: This contradicts condition 3 of the Pumping Lemma.
Conclusion: Therefore, L
is not regular.
Decision Problems for Regular Languages
An important aspect of formal language theory is understanding what questions about languages can be algorithmically decided. For regular languages, many fundamental problems have efficient solutions.
Decision Problems Overview
Problem | Regular Languages | Complexity |
---|---|---|
Membership | Decidable | O(|w|) for DFA |
Emptiness | Decidable | O(|Q| + |δ|) |
Finiteness | Decidable | O(|Q| + |δ|) |
Equivalence | Decidable | O(|Q₁|·|Q₂|) |
Minimization | Decidable | O(|Σ|·|Q|·log|Q|) |
Membership
Problem: Given a string w
and a regular language L
(represented by a DFA, NFA, or regex), is w ∈ L
?
Algorithm: Run the automaton on w
, or use a regex matching algorithm.
Complexity: O(|w|)
time for a DFA, O(|w|·|Q|)
for an NFA, where |Q|
is the number of states.
Emptiness
Problem: Is L = ∅
?
Algorithm: Check if there is a path from the start state to any accepting state in the automaton.
Complexity: O(|Q| + |δ|)
time using breadth-first search.
Finiteness
Problem: Is L
a finite language?
Algorithm: Check if there are any cycles in the automaton that can be reached from the start state and can reach an accepting state.
Complexity: O(|Q| + |δ|)
time using a modified depth-first search.
Equivalence
Problem: Given two regular languages L₁
and L₂
, is L₁ = L₂
?
Algorithm: Check if both L₁ - L₂
and L₂ - L₁
are empty. This uses the fact that regular languages are closed under difference and emptiness is decidable.
Complexity: If the languages are represented by DFAs with n
and m
states, the time complexity is O(n·m)
.
Minimization
Problem: Given a DFA M
, find the minimal DFA M'
that recognizes the same language.
Algorithm: Use the Hopcroft's algorithm to partition states into equivalence classes, where two states are equivalent if they cannot be distinguished by any input string.
Complexity: O(|Σ|·|Q|·log|Q|)
time, where |Σ|
is the size of the alphabet and |Q|
is the number of states.
Key property: For any regular language, there exists a unique minimal DFA (up to isomorphism) that recognizes it.
Myhill-Nerode Theorem and Minimal DFAs
The Myhill-Nerode theorem provides a beautiful characterization of regular languages in terms of equivalence relations, and it gives us a method for constructing the smallest possible DFA for a given regular language.
Theorem: Myhill-Nerode Theorem
A language L ⊆ Σ*
is regular if and only if the relation RL
defined by:
x RL y
if and only if for all z ∈ Σ*
, xz ∈ L ⇔ yz ∈ L
has a finite number of equivalence classes.
Moreover, the number of states in the minimal DFA for L
equals the number of equivalence classes of RL
.
Proof: Proof of Myhill-Nerode Theorem
(⇒) If L
is regular, then RL
has finitely many equivalence classes:
Let M = (Q, Σ, δ, q₀, F)
be a DFA for L
. For each state q ∈ Q
, define Sq
to be the set of all strings that lead from q₀
to q
in M
. That is, Sq = {w ∈ Σ* | δ̂(q₀, w) = q}
.
We can show that if x, y ∈ Sq
for the same q
, then x RL y
. This is because for any suffix z
, both xz
and yz
will end up in the same state after processing z
, starting from q
. Thus, either both are accepted or both are rejected.
Since M
has finitely many states, there are finitely many sets Sq
, and therefore finitely many equivalence classes under RL
.
(⇐) If RL
has finitely many equivalence classes, then L
is regular:
Let the equivalence classes of RL
be [x₁], [x₂], ..., [xₙ]
. We construct a DFA M = (Q, Σ, δ, q₀, F)
as follows:
Q = {[x₁], [x₂], ..., [xₙ]}
(the set of equivalence classes)q₀ = [ε]
(the equivalence class of the empty string)F = {[x] | x ∈ L}
(the equivalence classes containing strings inL
)δ([x], a) = [xa]
for all[x] ∈ Q
anda ∈ Σ
The transition function is well-defined: if x RL y
, then xa RL ya
for any a ∈ Σ
.
We can then prove that L(M) = L
by showing that for any string w
, δ̂(q₀, w) = [w]
, and w ∈ L
if and only if [w] ∈ F
.
Thus, L
is recognized by a DFA and is therefore regular.
Minimality: The constructed DFA is minimal because if we were to merge any two states [x]
and [y]
where x
and y
are not equivalent under RL
, there would exist some string z
such that xz ∈ L
but yz ∉ L
(or vice versa), meaning the merged DFA would not correctly recognize L
.
Brzozowski's Algorithm
One of the most elegant results in automata theory shows that DFA minimization can be achieved through a surprising combination of reversal and determinization operations.
Theorem: Brzozowski's Minimization Algorithm
Let M
be any NFA. Then:
minimize(M) = determinize(reverse(determinize(reverse(M))))
where reverse(M)
creates an NFA recognizing L(M)R
and determinize(M)
applies the subset construction.
This algorithm produces the minimal DFA for L(M)
in exactly two determinization steps, regardless of the structure of the input automaton.
Proof: Proof of Brzozowski's Algorithm
The proof relies on the relationship between reversal, determinization, and right/left languages.
Key insight: For any regular language L
, the states of the minimal DFA correspond exactly to the right derivatives of L
.
Step 1: reverse(M)
creates an NFA for LR
. Determinizing this NFA creates a DFA whose states correspond to left derivatives of L
.
Step 2: Reversing again gives an NFA for (LR)R = L
. The states from Step 1 become the foundation for identifying right derivatives.
Step 3: The final determinization creates a DFA whose states are exactly the right derivatives of L
, which correspond to the Myhill-Nerode equivalence classes.
Since the Myhill-Nerode equivalence classes define the minimal DFA uniquely, the result is guaranteed to be minimal.
Example: Brzozowski's Algorithm in Action
Let's trace through Brzozowski's algorithm on the language L = {strings ending with 'ab'}
Original NFA: States {q₀, q₁, q₂}
where:
q0
: start state, self-loops ona,b
q1
: reached ona
, goes toq2
onb
q2
: accept state
Step 1 - Reverse: Accept states become start states, transitions flip direction
Now recognizes strings starting with ba
Step 2 - Determinize: Apply subset construction
Creates DFA with states representing sets of original states
Step 3 - Reverse again: Back to recognizing strings ending with ab
Step 4 - Determinize again: Final minimal DFA
Result has exactly 3 states corresponding to: "haven't seen a
", "just saw a
", "just saw ab
"
Key insight: The algorithm automatically discovered the three equivalence classes without explicitly computing Myhill-Nerode relations!
State Complexity Theory
State complexity theory studies the number of states required to recognize regular languages and how this measure behaves under various operations. It provides a "complexity theory within regular languages."
Definition: State Complexity Measures
For a regular language L
, define:
- Deterministic state complexity:
sc(L) = min{|Q| : ∃ DFA M = (Q,Σ,δ,q₀,F) with L(M) = L}
- Nondeterministic state complexity:
nsc(L) = min{|Q| : ∃ NFA M with L(M) = L}
For operations on languages, we study worst-case complexity:
sc(∪)(m,n) = max{sc(L₁ ∪ L₂) : sc(L₁) = m, sc(L₂) = n}
sc(∩)(m,n) = max{sc(L₁ ∩ L₂) : sc(L₁) = m, sc(L₂) = n}
sc(·)(m,n) = max{sc(L₁ · L₂) : sc(L₁) = m, sc(L₂) = n}
Theorem: State Complexity of Regular Operations
The state complexity of basic regular language operations:
- Union:
sc(∪)(m,n) = mn
- Intersection:
sc(∩)(m,n) = mn
- Concatenation:
sc(·)(m,n) = m · 2n-1
- Kleene star:
sc(*)(n) = 2n-1 + 1
- Complement:
sc(¬)(n) = n
- Reverse:
sc(R)(n) = 2n
These bounds are tight - there exist witness languages achieving these complexities.
Definition: Witness Languages and Lower Bounds
To prove state complexity lower bounds, we construct witness languages that achieve the worst-case complexity. For example:
Concatenation witness: Let L1 = Σ*a
and L2
be any language requiring n
states. Then sc(L1 · L2) = m · 2n-1
where m = sc(L1)
.
Proof technique: Show that any DFA for L1 · L2
must distinguish between m · 2n-1
different right languages.
Example: State Complexity Explosion in Concatenation
Consider concatenating L1 = {strings ending with
and a
}L2 = {strings with exactly 2
b
}
Individual complexities:
L1
: 2 states (seena
at end or not)L2
: 3 states (seen 0, 1, or 2+b
)
Concatenation L1 · L2
:
Strings that end with a
, followed by strings with exactly 2 b
Examples: aabb
, baaabab
, ababaabaab
Why exponential blowup occurs:
When reading the input, the DFA doesn't know where L1
ends and L2
begins. It must track:
- Current state in
L1
- For each possible "split point", what state
L2
would be in
This creates 2 × 23-1 = 8
states in the minimal DFA.
The 8 states track:
a
, no valid splitsa
, split after: 0 b
seena
, split after: 1 b
seena
, split after: 2+ b
seena
, tracking 0 b
a
, tracking 1 b
a
, tracking 2+ b
General pattern: Concatenation forces the automaton to "remember" multiple possible parsing points simultaneously, leading to exponential state explosion.
Summary
- Closure Properties: Regular languages are closed under union, intersection, complement, concatenation, Kleene star, and reversal operations
- Pumping Lemma: A key tool for proving languages are not regular, based on the fact that sufficiently long strings must have repeatable sections
- Decision Problems: For regular languages, we can decide membership, emptiness, finiteness, equivalence, and find minimal automata
- Myhill-Nerode Theorem: Provides a fundamental characterization of regular languages in terms of an equivalence relation with finitely many classes
- Minimal DFA: For every regular language, there exists a unique minimal DFA (up to isomorphism)
- Equivalence Classes: States in the minimal DFA correspond exactly to the Myhill-Nerode equivalence classes
- Regularity Test: A language is regular if and only if it induces a finite number of distinct "future behaviors" for prefixes
Suggested Reading
- Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation – Chapter 4: Properties of Regular Languages
- Sipser, Introduction to the Theory of Computation – Chapter 1.3: Regular Expressions
- Sipser, Introduction to the Theory of Computation – Chapter 1.4: Regular and Nonregular Languages