A framework for understanding

Properties of Regular Languages

Having explored finite automata and regular expressions, we now turn to the fundamental properties that characterize regular languages. These properties not only deepen our understanding of regular languages but also provide powerful tools for proving whether a language is regular or not, and for constructing new regular languages from existing ones.

Closure Properties of Regular Languages

Closure Properties Overview

A language class is closed under an operation if applying that operation to language(s) in the class always results in another language in the same class. Regular languages are closed under all the following operations:

OperationRegular LanguagesDefinition
UnionL₁ ∪ L₂ = {w | w ∈ L₁ or w ∈ L₂}
IntersectionL₁ ∩ L₂ = {w | w ∈ L₁ and w ∈ L₂}
ComplementL̄ = Σ* - L = {w ∈ Σ* | w ∉ L}
ConcatenationL₁·L₂ = {w | w = xy, where x ∈ L₁ and y ∈ L₂}
Kleene StarL* = {w | w = w₁w₂...wₖ, where each wᵢ ∈ L and k ≥ 0}
ReversalLᴿ = {wᴿ | w ∈ L} where wᴿ is the reverse of w

Regular languages possess important closure properties, meaning that when we apply certain operations to regular languages, the result is always another regular language. These properties are fundamental both for theoretical understanding and for practical applications in language processing.

Union

Theorem: Closure Under Union

If L₁ and L₂ are regular languages, then their unionL₁ ∪ L₂ = {w | w ∈ L₁ or w ∈ L₂} is also a regular language.

Proof: Proof of Closure Under Union

Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁) andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂) be DFAs that recognize L₁ and L₂ respectively.

We construct a new NFA M = (Q, Σ, δ, q₀, F) that recognizes L₁ ∪ L₂:

  • Q = {q₀} ∪ Q₁ ∪ Q₂ (where q₀ is a new state)
  • δ(q₀, ε) = {q₀₁, q₀₂}
  • For all q ∈ Q₁, a ∈ Σ: δ(q, a) = δ₁(q, a)
  • For all q ∈ Q₂, a ∈ Σ: δ(q, a) = δ₂(q, a)
  • F = F₁ ∪ F₂

This NFA accepts any string that is accepted by either M₁ or M₂. Since NFAs recognize exactly the regular languages, L₁ ∪ L₂ is regular.

Example: Union Construction in Action

Let's construct the union of L1 = {strings ending with a}and L2 = {strings with even length}

Original DFAs:

DFA for L1:

  • q0: start, not ending with a
  • q1: accept, ending with a
  • Transitions: δ1(q0, a) = q1
  • δ1(q0, b) = q0, etc.

DFA for L2:

  • p0: start/accept, even length
  • p1: odd length
  • Transitions: δ2(p0, a) = p1
  • δ2(p1, a) = p0, etc.

Union NFA Construction:

  1. Create new start state qnew
  2. Add ε-transitions: δ(qnew, ε) = {q₀, p₀}
  3. Include all original states and transitions
  4. Accept states: F = {q₁, p₀}

Test Cases:

a: ✓ ends with a
In L1, so in union
bb: ✓ even length
In L2, so in union
ba: ✓ both conditions
In both languages
b: ✗ neither condition
Not in union

Key insight: The union automaton accepts if either original automaton would accept, achieved by branching to both start states with ε-transitions.

Intersection

Theorem: Closure Under Intersection

If L₁ and L₂ are regular languages, then their intersectionL₁ ∩ L₂ = {w | w ∈ L₁ and w ∈ L₂} is also a regular language.

Proof: Proof of Closure Under Intersection

Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁) andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂) be DFAs that recognize L₁ and L₂ respectively.

We construct a new DFA M = (Q, Σ, δ, q₀, F) that recognizes L₁ ∩ L₂:

  • Q = Q₁ × Q₂ (the Cartesian product of states)
  • q₀ = (q₀₁, q₀₂)
  • δ((q₁, q₂), a) = (δ₁(q₁, a), δ₂(q₂, a))
  • F = F₁ × F₂ = {(q₁, q₂) | q₁ ∈ F₁ and q₂ ∈ F₂}

This DFA simulates M₁ and M₂ in parallel, and accepts a string if and only if both original DFAs accept it. Since DFAs recognize exactly the regular languages, L₁ ∩ L₂ is regular.

Example: Product Construction for Intersection

Let's intersect L1 = {strings containing ab}and L2 = {strings with even number of a}

Product DFA Construction:

States: Q = Q1 × Q2

(q0, p0)
Start: no ab, even a
(q0, p1)
No ab, odd a
(q1, p0)
Saw a, even a
(q1, p1)
Saw a, odd a
(q2, p0)
Found ab, even a
(q2, p1)
Found ab, odd a

Sample Transitions:

  • δ((q0, p0), a) = (q1, p1) — saw a, now odd count
  • δ((q1, p1), b) = (q2, p1) — completed ab, still odd
  • δ((q2, p1), a) = (q1, p0) — another a, now even

Accepting States: Only (q2, p0)— must satisfy both conditions

Test Cases:

ab: ✓
Contains ab, 1 a (odd) → wait, should be even!
ab: ✗
Actually: odd number of a
abaa: ✓
Contains ab, 3 a total... odd again
aabaa: ✓
Contains ab, 4 a (even)

Key insight: The product construction tracks both machines simultaneously, accepting only when both original conditions are satisfied.

Complement

Theorem: Closure Under Complement

If L is a regular language over an alphabet Σ, then its complementL̄ = Σ* - L = {w ∈ Σ* | w ∉ L} is also a regular language.

Proof: Proof of Closure Under Complement

Let M = (Q, Σ, δ, q₀, F) be a DFA that recognizes L.

We construct a new DFA M' = (Q, Σ, δ, q₀, Q-F) that recognizes .

The only difference is in the set of accepting states: M' accepts exactly when M rejects, and vice versa. Since DFAs recognize exactly the regular languages, is regular.

Example: Complement by State Flipping

Let's find the complement of L = {strings containing aa}

Original DFA for L:

q0: Start
Haven't seen any a
q1
Just saw one a
q2: Accept
Found aa

Transitions:

  • δ(q0, a) = q1, δ(q0, b) = q0
  • δ(q1, a) = q2, δ(q1, b) = q0
  • δ(q2, a) = q2, δ(q2, b) = q2

Complement DFA for :

q0: Accept
No aa yet ✓
q1: Accept
Still no aa
q2: Reject
Found aa

Same transitions, but F' = {q₀, q₁} instead of F = {q₂}

Language Comparison:

Original L:

aa
baab
aabb
ab
ba
aba

Complement :

aa
baab
aabb
ab
ba
aba

Key insight: Complementation is trivial for DFAs — just flip which states are accepting. Every string goes to exactly one state, so we accept exactly when the original automaton rejects.

Concatenation

Theorem: Closure Under Concatenation

If L₁ and L₂ are regular languages, then their concatenationL₁·L₂ = {w | w = xy, where x ∈ L₁ and y ∈ L₂} is also a regular language.

Proof: Proof of Closure Under Concatenation

Let M₁ = (Q₁, Σ, δ₁, q₀₁, F₁) andM₂ = (Q₂, Σ, δ₂, q₀₂, F₂) be NFAs recognizing L₁ and L₂ respectively.

We construct a new NFA M = (Q, Σ, δ, q₀, F) for L₁·L₂:

  • Q = Q₁ ∪ Q₂
  • q₀ = q₀₁
  • For all q ∈ Q₁, a ∈ Σ: δ(q, a) = δ₁(q, a)
  • For all q ∈ F₁: δ(q, ε) = {q₀₂} ∪ δ₁(q, ε)
  • For all q ∈ Q₂, a ∈ Σ: δ(q, a) = δ₂(q, a)
  • F = F₂

This NFA simulates M₁ until it reaches an accepting state, at which point it can either continue in M₁ or transition to M₂'s start state. The NFA accepts if it finishes in an accepting state of M₂. Since NFAs recognize exactly the regular languages, L₁·L₂ is regular.

Example: NFA Construction for Concatenation

Let's concatenate L1 = {a, aa} and L2 = {b, bb}

Original NFAs:

NFA for L1:

  • q0: start
  • q1: accept, after reading a
  • q2: accept, after reading aa

Transitions:

  • δ1(q0, a) = q1
  • δ1(q1, a) = q2

NFA for L2:

  • p0: start
  • p1: accept, after reading b
  • p2: accept, after reading bb

Transitions:

  • δ2(p0, b) = p1
  • δ2(p1, b) = p2

Concatenation NFA Construction:

  1. Keep all states: Q = {q₀, q₁, q₂, p₀, p₁, p₂}
  2. Start state: q0 (from L1)
  3. Add ε-transitions from accept states of L1 to start of L2:
    • δ(q1, ε) = {p₀} — after a, can start L2
    • δ(q2, ε) = {p₀} — after aa, can start L2
  4. Accept states: F = {p₁, p₂} (only from L2)
  5. Make original L1 accept states non-accepting: q1, q2 ∉ F

Resulting Language L1 · L2:

ab
a + b

abb
a + bb

aab
aa + b

aabb
aa + bb


Trace: How aab is accepted:

  1. Start at state q0
  2. Read a: transition to state q1
  3. Read a: transition to state q2 (completed aa from L1)
  4. Take ε-transition: δ(q2, ε) = p0 (jump to start of L2)
  5. Read b: transition to state p1 (accept state) ✓

Key insight: ε-transitions connect the accepting states of L1to the start state of L2, allowing the NFA to "jump" between automata precisely when the first part is complete.

Kleene Star

Theorem: Closure Under Kleene Star

If L is a regular language, then its Kleene starL* = {w | w = w₁w₂...wₖ, where each wᵢ ∈ L and k ≥ 0} is also a regular language.

Proof: Proof of Closure Under Kleene Star

Let M = (Q, Σ, δ, q₀, F) be an NFA recognizing L.

We construct a new NFA M' = (Q', Σ, δ', q₀', F') for L*:

  • Q' = Q ∪ {q₀'}
  • q₀' is a new start state
  • δ'(q₀', ε) = {q₀} (ε-transition to original start state)
  • For all q ∈ Q, a ∈ Σ: δ'(q, a) = δ(q, a)
  • For all qf ∈ F: δ'(qf, ε) = {q₀} (ε-transition from accepting states back to start)
  • F' = F ∪ {q₀'} (new start state is also accepting)

This NFA can match the empty string (by accepting immediately from q₀'), single iterations of L (by following the original NFA), or multiple iterations (by going back to the start state after each accepted string). Since NFAs recognize exactly the regular languages, L* is regular.

Example: Kleene Star Construction

Let's construct the Kleene star of L = {ab, ba}

Original NFA for L:

States: q0 (start), q1 (after a), q2 (after b), q3 (accept ab), q4 (accept ba)

Transitions:

  • δ(q0, a) = q1
  • δ(q0, b) = q2
  • δ(q1, b) = q3 (completes ab)
  • δ(q2, a) = q4 (completes ba)

Accept states: F = {q₃, q₄}

Kleene Star NFA Construction:

  1. Add new start state qnew
  2. Make qnew an accept state (handles ε ∈ L*)
  3. Add ε-transition: δ(qnew, ε) = {q₀} (can start original machine)
  4. Add ε-transitions from accept states back to original start:
    • δ(q3, ε) = {q₀} — after ab, can repeat
    • δ(q4, ε) = {q₀} — after ba, can repeat
  5. Final accept states: F' = {qnew, q3, q4}

Resulting Language L*:

ε
0 repetitions

ab
1 repetition

ba
1 repetition

abab
2 repetitions

abba
2 repetitions

baba
2 repetitions

And infinitely many more combinations...

Trace: How abba is accepted:

  1. Start at new state qnew
  2. Take ε-transition: δ(qnew, ε) = q0
  3. Read ab: reach state q3 (first ab complete)
  4. Take ε-transition: δ(q3, ε) = q0 (loop back for repetition)
  5. Read ba: reach state q4 (second string ba complete) ✓

Key insight: The Kleene star construction creates loops via ε-transitions that allow any number of repetitions, including zero (handled by making the new start state accepting). The ε-transitions provide seamless "glue" between repetitions.

Reversal

Theorem: Closure Under Reversal

If L is a regular language over alphabet Σ, then its reversalLᴿ = {wᴿ | w ∈ L} is also a regular language, where wᴿ denotes the reversal of string w.

Proof: Proof of Closure Under Reversal

Let M = (Q, Σ, δ, q₀, F) be an NFA recognizing L.

We construct a new NFA M' = (Q, Σ, δ', F', {q₀}) for Lᴿ as follows:

  • Use the same set of states Q
  • Make the original start state q₀ the only accepting state in M'
  • Make the original accepting states F the set of start states in M' (we can simulate this with a new start state and ε-transitions)
  • Reverse all transitions: for each transition δ(q, a) = p in M, add a transition δ'(p, a) = q in M'

This NFA processes strings in reverse, starting from what were the accepting states and accepting if it reaches what was the start state. Since NFAs recognize exactly the regular languages, Lᴿ is regular.

Example: Language Reversal Construction

Let's reverse L = {strings starting with a and ending with b}

Original NFA for L:

States: q0 (start), q1 (after a), q2 (accept, after b)

Transitions:

  • δ(q0, a) = q1 — must start with a
  • δ(q1, a) = q1 — can see more a
  • δ(q1, b) = q1 — can see b in middle
  • δ(q1, b) = q2 — transition to accept on b
  • δ(q2, a) = q2 — stay accepting on a
  • δ(q2, b) = q2 — stay accepting on b

Accept states: F = {q₂}

Reversal Construction Steps:

  1. Reverse all transitions:
    • δ'(q1, a) = q0 (was δ(q0, a) = q1)
    • δ'(q1, a) = q1 (was δ(q1, a) = q1, self-loop preserved)
    • δ'(q1, b) = q1 (was δ(q1, b) = q1, self-loop preserved)
    • δ'(q2, b) = q1 (was δ(q1, b) = q2)
    • δ'(q2, a) = q2 (was δ(q2, a) = q2, self-loop preserved)
    • δ'(q2, b) = q2 (was δ(q2, b) = q2, self-loop preserved)
  2. Swap start and accept states:
    • New start states: {q₂} (old accept states)
    • New accept states: {q₀} (old start state)

Reversed NFA for LR:

Now recognizes strings starting with b and ending with a

Start state: q2, Accept state: q0

The flow is now: q2 (start) → q1 (on b) → q0 (accept, on a)

Language Transformation:

Original L:

ab
aabb
abab
ba
bb

Pattern: a...b

Reversed LR:

ba
bbaa
baba
ab
aa

Pattern: b...a

Trace: How baba is accepted in LR:

  1. Start at state q2 (new start state)
  2. Read b: transition to q1 (via δ'(q2, b) = q1)
  3. Read a: transition to q1 (self-loop δ'(q1, a) = q1)
  4. Read b: stay at q1 (self-loop δ'(q1, b) = q1)
  5. Read a: transition to q0 (accept state) ✓

Key insight: Reversal flips the automaton's "temporal direction" — the machine now processes strings as if reading the original strings backwards. By reversing transitions and swapping start/accept roles, we achieve exactly this effect.

The Pumping Lemma for Regular Languages

While the closure properties show us how to build new regular languages, the Pumping Lemma gives us a powerful tool for proving that certain languages are not regular. It exploits a fundamental limitation of finite automata: their inability to "count" or "remember" an arbitrary amount of information.

Lemma: Pumping Lemma for Regular Languages

If L is a regular language, then there exists a constant p ≥ 1 (the "pumping length") such that for every string w ∈ L with |w| ≥ p, w can be decomposed into three partsw = xyz, satisfying:

  1. |y| > 0 (y is not empty)
  2. |xy| ≤ p (x and y are within the first p characters)
  3. For all i ≥ 0, xyᶦz ∈ L (pumping y any number of times keeps the string in the language)

Proof: Proof of the Pumping Lemma

Let L be a regular language, and let M = (Q, Σ, δ, q₀, F) be a DFA that recognizes L. Set p = |Q| (the number of states in M).

Consider any string w ∈ L with |w| ≥ p. When M processes w, it goes through a sequence of |w|+1 states: r₀, r₁, ..., r|w|, where r₀ = q₀and r|w| ∈ F.

Since |w| ≥ p = |Q|, by the Pigeonhole Principle, there must be at least one state that appears more than once in the first p+1 states of this sequence. Let rᵢ = rⱼfor some 0 ≤ i < j ≤ p.

We can decompose w as follows:
x = first i symbols of w (which take M from r₀ to rᵢ)
y = symbols from positions i+1 to j (which take M from rᵢ back to rᵢ = rⱼ)
z = remaining symbols (which take M from rⱼ to an accepting state)

This decomposition satisfies all three conditions:
1. |y| = j-i > 0 (since i < j)
2. |xy| = j ≤ p
3. For any i ≥ 0, xyᶦz ∈ L, because:
  - x takes M from r₀ to rᵢ
  - Each repetition of y takes M from rᵢ back to rᵢ
  - z takes M from rᵢ to an accepting state

Using the Pumping Lemma to Prove Non-Regularity

The Pumping Lemma is typically used to prove that a language is not regular through a proof by contradiction:

  1. Assume that language L is regular
  2. Then the Pumping Lemma applies to L with some pumping length p
  3. Find a string w ∈ L with |w| ≥ p
  4. Show that for any decomposition w = xyz satisfying conditions 1 and 2, there exists some i ≥ 0 such that xyᶦz ∉ L
  5. This contradicts condition 3 of the Pumping Lemma
  6. Therefore, L cannot be regular

Example: Proving {aⁿ bⁿ | n ≥ 0} is Not Regular

Let's prove that the language L = {aⁿ bⁿ | n ≥ 0} is not regular.

Step 1: Assume, for contradiction, that L is regular.

Step 2: By the Pumping Lemma, there exists a pumping length p ≥ 1.

Step 3: Consider the string w = aᵖ bᵖ ∈ L.

Step 4: Any decomposition w = xyz with |y| > 0 and |xy| ≤ p must have y = aᵏfor some k > 0, since the first p characters of w are all a.

Step 5: Consider xy²z. This string has p+k a and p b. Since k > 0, the number of a exceeds the number of b, so xy²z ∉ L.

Step 6: This contradicts condition 3 of the Pumping Lemma.

Conclusion: Therefore, L is not regular.

Decision Problems for Regular Languages

An important aspect of formal language theory is understanding what questions about languages can be algorithmically decided. For regular languages, many fundamental problems have efficient solutions.

Decision Problems Overview

ProblemRegular LanguagesComplexity
MembershipDecidableO(|w|) for DFA
EmptinessDecidableO(|Q| + |δ|)
FinitenessDecidableO(|Q| + |δ|)
EquivalenceDecidableO(|Q₁|·|Q₂|)
MinimizationDecidableO(|Σ|·|Q|·log|Q|)

Membership

Problem: Given a string w and a regular language L (represented by a DFA, NFA, or regex), is w ∈ L?

Algorithm: Run the automaton on w, or use a regex matching algorithm.

Complexity: O(|w|) time for a DFA, O(|w|·|Q|) for an NFA, where |Q| is the number of states.

Emptiness

Problem: Is L = ∅?

Algorithm: Check if there is a path from the start state to any accepting state in the automaton.

Complexity: O(|Q| + |δ|) time using breadth-first search.

Finiteness

Problem: Is L a finite language?

Algorithm: Check if there are any cycles in the automaton that can be reached from the start state and can reach an accepting state.

Complexity: O(|Q| + |δ|) time using a modified depth-first search.

Equivalence

Problem: Given two regular languages L₁ and L₂, is L₁ = L₂?

Algorithm: Check if both L₁ - L₂ and L₂ - L₁ are empty. This uses the fact that regular languages are closed under difference and emptiness is decidable.

Complexity: If the languages are represented by DFAs with n and m states, the time complexity is O(n·m).

Minimization

Problem: Given a DFA M, find the minimal DFA M' that recognizes the same language.

Algorithm: Use the Hopcroft's algorithm to partition states into equivalence classes, where two states are equivalent if they cannot be distinguished by any input string.

Complexity: O(|Σ|·|Q|·log|Q|) time, where |Σ| is the size of the alphabet and |Q| is the number of states.

Key property: For any regular language, there exists a unique minimal DFA (up to isomorphism) that recognizes it.

Myhill-Nerode Theorem and Minimal DFAs

The Myhill-Nerode theorem provides a beautiful characterization of regular languages in terms of equivalence relations, and it gives us a method for constructing the smallest possible DFA for a given regular language.

Theorem: Myhill-Nerode Theorem

A language L ⊆ Σ* is regular if and only if the relation RL defined by:

x RL y if and only if for all z ∈ Σ*, xz ∈ L ⇔ yz ∈ L

has a finite number of equivalence classes.

Moreover, the number of states in the minimal DFA for L equals the number of equivalence classes of RL.

Proof: Proof of Myhill-Nerode Theorem

(⇒) If L is regular, then RL has finitely many equivalence classes:

Let M = (Q, Σ, δ, q₀, F) be a DFA for L. For each state q ∈ Q, define Sq to be the set of all strings that lead from q₀ to q in M. That is, Sq = {w ∈ Σ* | δ̂(q₀, w) = q}.

We can show that if x, y ∈ Sq for the same q, then x RL y. This is because for any suffix z, both xz and yz will end up in the same state after processing z, starting from q. Thus, either both are accepted or both are rejected.

Since M has finitely many states, there are finitely many sets Sq, and therefore finitely many equivalence classes under RL.

(⇐) If RL has finitely many equivalence classes, then L is regular:

Let the equivalence classes of RL be [x₁], [x₂], ..., [xₙ]. We construct a DFA M = (Q, Σ, δ, q₀, F) as follows:

  • Q = {[x₁], [x₂], ..., [xₙ]} (the set of equivalence classes)
  • q₀ = [ε] (the equivalence class of the empty string)
  • F = {[x] | x ∈ L} (the equivalence classes containing strings in L)
  • δ([x], a) = [xa] for all [x] ∈ Q and a ∈ Σ

The transition function is well-defined: if x RL y, then xa RL ya for any a ∈ Σ.

We can then prove that L(M) = L by showing that for any string w, δ̂(q₀, w) = [w], and w ∈ L if and only if [w] ∈ F.

Thus, L is recognized by a DFA and is therefore regular.

Minimality: The constructed DFA is minimal because if we were to merge any two states [x] and [y] where x and y are not equivalent under RL, there would exist some string z such that xz ∈ L but yz ∉ L (or vice versa), meaning the merged DFA would not correctly recognize L.

Brzozowski's Algorithm

One of the most elegant results in automata theory shows that DFA minimization can be achieved through a surprising combination of reversal and determinization operations.

Theorem: Brzozowski's Minimization Algorithm

Let M be any NFA. Then:

minimize(M) = determinize(reverse(determinize(reverse(M))))

where reverse(M) creates an NFA recognizing L(M)Rand determinize(M) applies the subset construction.

This algorithm produces the minimal DFA for L(M) in exactly two determinization steps, regardless of the structure of the input automaton.

Proof: Proof of Brzozowski's Algorithm

The proof relies on the relationship between reversal, determinization, and right/left languages.

Key insight: For any regular language L, the states of the minimal DFA correspond exactly to the right derivatives of L.

Step 1: reverse(M) creates an NFA for LR. Determinizing this NFA creates a DFA whose states correspond to left derivatives of L.

Step 2: Reversing again gives an NFA for (LR)R = L. The states from Step 1 become the foundation for identifying right derivatives.

Step 3: The final determinization creates a DFA whose states are exactly the right derivatives of L, which correspond to the Myhill-Nerode equivalence classes.

Since the Myhill-Nerode equivalence classes define the minimal DFA uniquely, the result is guaranteed to be minimal.

Example: Brzozowski's Algorithm in Action

Let's trace through Brzozowski's algorithm on the language L = {strings ending with 'ab'}

Original NFA: States {q₀, q₁, q₂} where:

  • q0: start state, self-loops on a,b
  • q1: reached on a, goes to q2 on b
  • q2: accept state

Step 1 - Reverse: Accept states become start states, transitions flip direction

Now recognizes strings starting with ba

Step 2 - Determinize: Apply subset construction

Creates DFA with states representing sets of original states

Step 3 - Reverse again: Back to recognizing strings ending with ab

Step 4 - Determinize again: Final minimal DFA

Result has exactly 3 states corresponding to: "haven't seen a", "just saw a", "just saw ab"

Key insight: The algorithm automatically discovered the three equivalence classes without explicitly computing Myhill-Nerode relations!

State Complexity Theory

State complexity theory studies the number of states required to recognize regular languages and how this measure behaves under various operations. It provides a "complexity theory within regular languages."

Definition: State Complexity Measures

For a regular language L, define:

  • Deterministic state complexity: sc(L) = min{|Q| : ∃ DFA M = (Q,Σ,δ,q₀,F) with L(M) = L}
  • Nondeterministic state complexity: nsc(L) = min{|Q| : ∃ NFA M with L(M) = L}

For operations on languages, we study worst-case complexity:

  • sc(∪)(m,n) = max{sc(L₁ ∪ L₂) : sc(L₁) = m, sc(L₂) = n}
  • sc(∩)(m,n) = max{sc(L₁ ∩ L₂) : sc(L₁) = m, sc(L₂) = n}
  • sc(·)(m,n) = max{sc(L₁ · L₂) : sc(L₁) = m, sc(L₂) = n}

Theorem: State Complexity of Regular Operations

The state complexity of basic regular language operations:

  • Union: sc(∪)(m,n) = mn
  • Intersection: sc(∩)(m,n) = mn
  • Concatenation: sc(·)(m,n) = m · 2n-1
  • Kleene star: sc(*)(n) = 2n-1 + 1
  • Complement: sc(¬)(n) = n
  • Reverse: sc(R)(n) = 2n

These bounds are tight - there exist witness languages achieving these complexities.

Definition: Witness Languages and Lower Bounds

To prove state complexity lower bounds, we construct witness languages that achieve the worst-case complexity. For example:

Concatenation witness: Let L1 = Σ*aand L2 be any language requiring n states. Then sc(L1 · L2) = m · 2n-1 where m = sc(L1).

Proof technique: Show that any DFA for L1 · L2must distinguish between m · 2n-1 different right languages.

Example: State Complexity Explosion in Concatenation

Consider concatenating L1 = {strings ending with a} and L2 = {strings with exactly 2 b}

Individual complexities:

  • L1: 2 states (seen a at end or not)
  • L2: 3 states (seen 0, 1, or 2+ b)

Concatenation L1 · L2:

Strings that end with a, followed by strings with exactly 2 b

Examples: aabb, baaabab, ababaabaab

Why exponential blowup occurs:

When reading the input, the DFA doesn't know where L1 ends and L2 begins. It must track:

  • Current state in L1
  • For each possible "split point", what state L2 would be in

This creates 2 × 23-1 = 8 states in the minimal DFA.

The 8 states track:

1. Not seen a, no valid splits
2. Seen a, split after: 0 b seen
3. Seen a, split after: 1 b seen
4. Seen a, split after: 2+ b seen
5. Not seen a, tracking 0 b
6. Not seen a, tracking 1 b
7. Not seen a, tracking 2+ b
8. Accept state

General pattern: Concatenation forces the automaton to "remember" multiple possible parsing points simultaneously, leading to exponential state explosion.

Summary

  • Closure Properties: Regular languages are closed under union, intersection, complement, concatenation, Kleene star, and reversal operations
  • Pumping Lemma: A key tool for proving languages are not regular, based on the fact that sufficiently long strings must have repeatable sections
  • Decision Problems: For regular languages, we can decide membership, emptiness, finiteness, equivalence, and find minimal automata
  • Myhill-Nerode Theorem: Provides a fundamental characterization of regular languages in terms of an equivalence relation with finitely many classes
  • Minimal DFA: For every regular language, there exists a unique minimal DFA (up to isomorphism)
  • Equivalence Classes: States in the minimal DFA correspond exactly to the Myhill-Nerode equivalence classes
  • Regularity Test: A language is regular if and only if it induces a finite number of distinct "future behaviors" for prefixes

Suggested Reading

  • Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation – Chapter 4: Properties of Regular Languages
  • Sipser, Introduction to the Theory of Computation – Chapter 1.3: Regular Expressions
  • Sipser, Introduction to the Theory of Computation – Chapter 1.4: Regular and Nonregular Languages