A framework for understanding

Alphabets, Strings, and Languages

Before diving into automata theory, we need to establish the fundamental concepts of alphabets, strings, and languages. These form the basis for understanding all models of computation that we'll study.

Alphabets

Definition: Alphabet

An alphabet is a finite, non-empty set of symbols, typically denoted by Σ (Sigma).

Examples of alphabets include:

  • Σ = {a,b} - A binary alphabet
  • Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} - The set of decimal digits
  • Σ = {a, b, c, ..., z} - The lowercase English alphabet

Strings

Definition: String

Given an alphabet Σ, a string over Σ is a finite sequence of symbols from Σ. The set of all strings over Σ is denoted as Σ*.

The empty string, denoted by ε (epsilon), is the string containing no symbols.

Examples of strings over the alphabet Σ = {a, b}:

  • ε - The empty string
  • a - A string of length 1
  • ab - A string of length 2
  • aaba - A string of length 4
  • bbbbb - A string of length 5

String Properties and Operations

String Length

The length of a string s, denoted |s|, is the number of symbols in the string. For example, |abba| = 4 and |ε| = 0.

Theorem: Fundamental String Properties

For any strings x, y, z ∈ Σ*:

  1. |xy| = |x| + |y|
  2. |xR| = |x|
  3. |ε| = 0
  4. For any k ∈ ℕ, |xk| = k·|x|
  5. (xy)R = yRxR

Proof: Proof of (xy)ʳ = yʳxʳ

Let x = a1a2...an and y = b1b2...bm where ai, bj ∈ Σ.

Then xy = a1a2...anb1b2...bm

The reversal of xy is (xy)R = bm...b2b1an...a2a1

We also have xR = an...a2a1 and yR = bm...b2b1

Therefore yRxR = bm...b2b1an...a2a1 = (xy)R

Theorem: Prefix and Suffix Properties

For any string w ∈ Σ*, define:

  • Pref(w) = {x ∈ Σ* | ∃y ∈ Σ*, w = xy} (the set of all prefixes of w)
  • Suff(w) = {y ∈ Σ* | ∃x ∈ Σ*, w = xy} (the set of all suffixes of w)

Then the following properties hold:

  1. |Pref(w)| = |Suff(w)| = |w| + 1
  2. For any w with |w| > 0, w ∈ Pref(w) ∩ Suff(w)
  3. ε ∈ Pref(w) ∩ Suff(w) for all w ∈ Σ*

String Concatenation

The concatenation of two strings s and t, denoted s·t or simply st, is the string formed by appending t to the end of s.

For example, if s = ab and t = ba, then st = abba.

String Reversal

The reversal of a string s, denoted sR, is the string formed by writing the symbols of s in reverse order.

For example, if s = abc, then sR = cba.

String Combinatorics and Periodicity Theory

Beyond basic operations, strings exhibit deep mathematical structure related to repetition, periodicity, and combinatorial properties. These concepts form the foundation for understanding pattern matching, string algorithms, and formal language complexity.

Definition: Primitive Strings and Periods

A string w ∈ Σ* is called primitive if w ≠ ukfor any string u ∈ Σ* and integer k ≥ 2.

A period of string w is a positive integer p such thatw[i] = w[i + p] for all valid indices i. The minimal period of w is the smallest period of w.

A string w has period p if there exists a stringu of length p such that w is a prefix of u(the infinite repetition of u).

Examples of Primitive Strings and Periods

Primitive Strings:

  • abc - Cannot be written as repetition of shorter string
  • abcd - Primitive
  • abcde - Primitive

Non-primitive Strings:

  • abab = (ab)² - Repetition of ab
  • abcabc = (abc)² - Repetition of abc
  • aaaa = a⁴ - Repetition of a

Period Analysis:

  • abababab - Periods: 2, 4, 6, 8. Minimal period: 2
  • abcabcabc - Periods: 3, 6, 9. Minimal period: 3
  • aaaa - Periods: 1, 2, 3, 4. Minimal period: 1
  • abcd - Only period: 4 (minimal period equals length)

Theorem: Primitive String Decomposition

Every non-empty string w ∈ Σ* can be uniquely written as w = ukwhere u is primitive and k ≥ 1.

The string u is called the primitive root of w, and k is the exponent of the decomposition.

Proof: Proof of Primitive String Decomposition

Existence: Let w be a non-empty string and let pbe its minimal period. Define u = w[1..p] (the prefix of w of length p).

Since p is a period of w, we have w[i] = w[i + p]for all valid i. This means w is a prefix of the infinite string u.

Therefore, w = ukv where k = ⌊|w|/p⌋ andv is a proper prefix of u (possibly empty).

If v ≠ ε, then w would have a period smaller than p, contradicting the minimality of p. Therefore, v = ε and w = uk.

We must show that u is primitive. Suppose u = sj for somes and j ≥ 2. Then w = (sj)k = sjk, which means w has period |s| = |u|/j < |u| = p, contradicting the minimality of p.

Uniqueness: Suppose w = uk = v where both u andv are primitive. Then |u| and |v| are both periods of w.

Since u and v are primitive, they have minimal periods equal to their lengths. By the minimality of period, we must have |u| = |v|, and therefore u = v and k = ℓ.

Theorem: Fine-Wilf Theorem

Let w be a string with periods p and q. If |w| ≥ p + q - gcd(p,q), then w also has period gcd(p,q).

In particular, if gcd(p,q) = 1 and |w| ≥ p + q - 1, then w has period 1 (i.e., all symbols in w are identical).

Proof: Proof of Fine-Wilf Theorem

Let d = gcd(p,q). We will show that w has period dby proving that w[i] = w[i + d] for all valid indices i.

Since d = gcd(p,q), there exist integers a, b such that d = ap + bq(by Bézout's identity).

For any index i such that both i and i + d are valid positions in w, we need to show w[i] = w[i + d].

We have i + d = i + ap + bq. Using the fact that w has periods p and q:

  • If a ≥ 0, then w[i] = w[i + ap] (applying period p a total of a times)
  • If a < 0, then w[i + ap] = w[i] (applying period p backwards)

Similarly for the term bq. The key insight is that we can "reach" position i + d from position iby a sequence of steps of size p and q, and each step preserves the character due to the periodicity.

The condition |w| ≥ p + q - d ensures that all intermediate positions in this sequence are valid indices in the string, allowing us to complete the chain of equalities.

Therefore, w[i] = w[i + d] for all valid i, proving that w has period d.

Worked Example: Fine-Wilf Theorem Application

Example 1: String with periods 6 and 10

Consider a string w with periods 6 and 10.

  • gcd(6, 10) = 2
  • Fine-Wilf threshold: 6 + 10 - 2 = 14
  • If |w| ≥ 14, then w also has period 2

Example 2: String with coprime periods

Consider a string w with periods 3 and 5.

  • gcd(3, 5) = 1
  • Fine-Wilf threshold: 3 + 5 - 1 = 7
  • If |w| ≥ 7, then w has period 1
  • This means w = ak for some symbol a

Concrete Example:

String: abcabcabc (length 9)

  • Has period 3: abc|abc|abc
  • Does NOT have period 5 (check: positions 1 and 6 have different characters)
  • So Fine-Wilf doesn't apply - it's not true that every long string has multiple periods

Another Example:

String: aaaaaaa (length 7)

  • Has period 1, period 2, period 3, ..., period 7
  • Specifically has periods 3 and 5
  • Length 7 ≥ 3 + 5 - 1 = 7, so Fine-Wilf applies
  • Indeed, has period gcd(3,5) = 1

Definition: Border of a String

A border of a string w is a string u that is both a proper prefix and a proper suffix of w.

The border array B[1..n] of a string w[1..n] is defined as:B[i] = length of the longest border of w[1..i].

A string w is border-free (or unbordered) if it has no borders.

Theorem: Relationship Between Borders and Periods

For a string w of length n:

  1. w has a border of length k if and only if w has period n - k
  2. The shortest period of w is n - (length of longest border)
  3. If w has period p, then w has a border of length n - p

Proof: Proof of Border-Period Relationship

Let w be a string of length n.

(⇒) Suppose w has a border u of length k. Then w = uv = vu for some string v of length n - k.

From uv = vu, we get that for any position i ≤ n - (n-k) = k:w[i] = u[i] = v[i] = w[i + (n-k)].

For positions i > k, we have w[i] = v[i-k] andw[i + (n-k)] = u[i + (n-k) - k] = u[i - k] = w[i - k], which by the periodicity already established equals w[i].

Therefore, w[i] = w[i + (n-k)] for all valid i, so w has period n - k.

(⇐) Suppose w has period p = n - k. Then w[i] = w[i + p] for all valid i.

Define u = w[1..k]. We need to show that u is also a suffix of w.

For any j ∈ [1,k], we have:w[n-k+j] = w[(n-k+j) - p] = w[j] (using period p).

Therefore, w[n-k+1..n] = w[1..k] = u, so u is both a prefix and suffix of w, making it a border of length k.

Worked Example: Border Analysis

String: abcabcab (length 8)


Finding Borders:

  • Prefixes: a, ab, abc, abca, abcab, abcabc, abcabca
  • Suffixes: b, ab, cab, bcab, cabcab, bcabcab, cabcabcab
  • Common: ab, abcab
  • Borders: ab (length 2), abcab (length 5)

Corresponding Periods:

  • Border length 2 ⇒ Period 8 - 2 = 6
  • Border length 5 ⇒ Period 8 - 5 = 3
  • Shortest period: 3 (from longest border)

Verification:

  • Period 3: abc|abc|ab ✓ (pattern repeats every 3)
  • Period 6: abcabc|ab ✓ (pattern repeats every 6)

Border Array: [0, 0, 0, 1, 2, 3, 4, 5]

Each position shows the length of the longest border of the prefix ending at that position.

Languages

Definition: Language

A language L over an alphabet Σ is a subset of Σ* (i.e., L ⊆ Σ*). In other words, a language is a set of strings over the alphabet.

Examples of languages over the alphabet Σ = {a, b}:

  • L₁ = {ε} - The language containing only the empty string
  • L₂ = {a, aa, aaa, ...} - The language of all strings consisting only of a
  • L₃ = {ε, a, b, aa, ab, ba, bb, ...} - The language of all strings (Σ*)
  • L₄ = {aa, ab, ba, bb, aaa, ...} - The language of all strings of length ≥ 2
  • L₅ = {ε, ab, abab, ababab, ...} - The language of strings consisting of zero or more repetitions of ab

Theorem: Cardinality of Languages

For any finite alphabet Σ with |Σ| ≥ 1:

  1. Σ* is countably infinite
  2. The set of all languages over Σ, denoted as P(Σ*), is uncountable

Proof: Proof of the Uncountability of Language Set

We will prove that P(Σ*) is uncountable using diagonalization.

Assume, for contradiction, that P(Σ*) is countable. Then we can enumerate all languages as L₁, L₂, L₃, ...

Since Σ* is countable, we can also enumerate all strings in Σ* as s₁, s₂, s₃, ...

Now, define a new language L' as follows:

si ∈ L' if and only if si ∉ Li

This language L' must differ from every language in our enumeration:

  • L' differs from L₁ on string s₁
  • L' differs from L₂ on string s₂
  • ...
  • L' differs from Ln on string sn

Therefore, L' cannot be in our enumeration of all languages, which contradicts our assumption that we listed all languages. Hence, P(Σ*) must be uncountable.

Operations on Languages

Since languages are sets, we can perform the usual set operations on them: union, intersection, and complement. Additionally, there are operations specific to languages:

Concatenation

The concatenation of languages L₁ and L₂, denoted L₁·L₂ or L₁L₂, is the set{xy | x ∈ L₁ and y ∈ L₂}.

Kleene Star

The Kleene star of a language L, denoted L*, is the set of all strings that can be formed by concatenating zero or more strings from L. Formally:

L* = {ε} ∪ L ∪ LL ∪ LLL ∪ ...

Kleene Plus

The Kleene plus of a language L, denoted L⁺, is the set of all strings that can be formed by concatenating one or more strings from L. Formally:

L⁺ = L ∪ LL ∪ LLL ∪ ...

The key difference between Kleene star and Kleene plus is that L* always includes the empty string ε, whereas L⁺ only includes ε if ε ∈ L. In other words, L⁺ = LL*.

Theorem: Properties of Language Operations

For languages A, B, C ⊆ Σ*:

  1. Distributivity: A(B∪C) = AB∪AC and (B∪C)A = BA∪CA
  2. Reversal: (A∪B)R = AR∪BR and (AB)R = BRAR
  3. Complementation: (A∪B)' = A'∩B' and (A∩B)' = A'∪B'

Theorem: Properties of Kleene Operations

For any languages A, B ⊆ Σ*:

  1. (A*)* = A*
  2. A*A* = A*
  3. A⁺ = AA*
  4. (A∪B)* ≠ A*B* in general
  5. (A*)R = (AR)*

Proof: Proof of A*A* = A*

We need to show that A*A* ⊆ A* and A* ⊆ A*A*.

A* ⊆ A*A*: Let w ∈ A*. We can write w = w·ε where w ∈ A* and ε ∈ A*(since ε ∈ A* by definition of Kleene star). Thus w ∈ A*A*.

A*A* ⊆ A*: Let w ∈ A*A*. Then w = uv for some u, v ∈ A*.

By definition of A*, u is a concatenation of zero or more strings from A: u = u₁u₂...unwhere each ui ∈ A.

Similarly, v is a concatenation of zero or more strings from A: v = v₁v₂...vmwhere each vj ∈ A.

Therefore, w = uv = u₁u₂...unv₁v₂...vm is also a concatenation of zero or more strings from A, which means w ∈ A*.

Since A* ⊆ A*A* and A*A* ⊆ A*, we conclude that A*A* = A*.

Language Quotient Operations

Beyond the standard set operations and Kleene closure, languages support quotient operations that "divide" one language by another. These operations are fundamental in formal language theory and provide insights into the structure of regular and context-free languages.

Definition: Left and Right Quotients

Given languages A, B ⊆ Σ*:

  • The left quotient of A by B is:
    B\A = {w ∈ Σ* | ∃u ∈ B, uw ∈ A}
  • The right quotient of A by B is:
    A/B = {w ∈ Σ* | ∃u ∈ B, wu ∈ A}

When B = {u} is a singleton, we write u\A and A/ufor the quotients by the single string u.

Examples of Quotient Operations

Example 1: Simple Quotients

Let A = {ab, abc, bc} and B = {a, b}

Left Quotient B\A:

  • For u = a: strings in A starting with a are {ab, abc} → remainders {b, bc}
  • For u = b: strings in A starting with b are {bc} → remainder {c}
  • Result: B\A = {b, bc, c}

Right Quotient A/B:

  • For u = a: no strings in A end with a → no remainders
  • For u = b: strings in A ending with b are {ab} → remainder {a}
  • Result: A/B = {a}

Example 2: Quotient by Single String

Let L = {abc, abcd, def, abef}

  • ab\L: Remove ab from left → {c, cd, ef}
  • L/ef: Remove ef from right → {d, ab}
  • xyz\L: No strings start with xyz

Theorem: Basic Properties of Quotient Operations

For languages A, B, C ⊆ Σ* and strings u, v ∈ Σ*:

  1. Empty quotients: ∅\A = A/∅ = ∅
  2. Epsilon quotients: ε\A = A/ε = A
  3. Distributivity over union: B\(A₁ ∪ A₂) = (B\A₁) ∪ (B\A₂)
  4. Associativity: u\(v\A) = (uv)\A and (A/u)/v = A/(vu)
  5. Quotient-concatenation relation: u ∈ v\(uA) if and only if v is a prefix of u

Proof: Proof of Quotient-Concatenation Relationship

We prove that B\(AB) ⊇ B for any languages A, B.

Let w ∈ B. We need to show that w ∈ B\(AB), i.e., there exists u ∈ B such that uw ∈ AB.

Choose u = w ∈ B. Then uw = ww. If ε ∈ A (empty string is in A), then ww = w·w ∈ ABsince w ∈ B.

More generally, for any a ∈ A, we have wa ∈ AB. If we can write w = ua for some u ∈ B and a ∈ A, then uw = u(ua) = (uu)a.

The complete relationship is: B\(AB) = {w ∈ Σ* | Bw ∩ AB ≠ ∅}, where Bw = {uw | u ∈ B}.

This shows that quotient operations interact with concatenation in a way that reflects the "cancellation" property of string division, albeit generalized to language operations.

Theorem: Quotients and Regular Languages

If A is a regular language and B is any language, then:

  1. B\A and A/B are regular languages
  2. If B is also regular, then B\A and A/B are regular
  3. The quotient of a regular language by a single string is always regular

This closure property is fundamental to the theory of regular languages and is used extensively in automata constructions and language recognition algorithms.

Worked Example: Quotients in Regular Languages

Language: L = {strings over {a,b} ending with 'ab'}

In regex notation: (a|b)*ab


Right Quotient L/{{b}}:

Remove b from the right side of strings in L

  • Strings in L: ab, aab, bab, aaab, abab, baab, ...
  • All end with ab, so removing b gives strings ending with a
  • Result: L/{b} = {strings ending with 'a'}
  • In regex: (a|b)*a

Right Quotient L/{{ab}}:

Remove ab from the right side of strings in L

  • Every string in L ends with ab
  • Removing ab gives all possible prefixes
  • Result: L/{ab} = (a|b)* (all strings)

Left Quotient {{a}}\L:

Remove a from the left side of strings in L

  • Strings in L starting with a: ab, aab, aaab, abab, ...
  • After removing a: b, ab, aab, bab, ...
  • These are exactly strings ending with b (including just b)
  • Result: {a}\L = {strings ending with 'b'}
  • In regex: (a|b)*b

Verification of Regularity:

All quotient results are expressible as regular expressions, confirming that regular languages are closed under quotient operations.

Theorem: Quotient Hierarchy and Iteration

For languages A, B ⊆ Σ*, define the sequence of iterated quotients:

  • A0 = A
  • An+1 = B\An for n ≥ 0

Then:

  1. The sequence (An)n≥0 is eventually periodic
  2. If A is regular, then each An is regular
  3. n≥0 An represents the "quotient closure" of A under B

Applications of Quotient Operations

1. String Pattern Matching

Quotients are used in pattern matching algorithms to represent "what remains to be matched" after processing part of a pattern.


2. Automata Construction

States in finite automata can be viewed as representing quotient languages - each state corresponds to the set of strings that lead to acceptance from that state.


3. Language Equation Solving

Quotients help solve language equations of the form AX = B (find language X such that concatenating A with X gives B).


4. Formal Language Decision Problems

Many decision problems about regular languages can be reduced to computing and analyzing quotient languages.

String Morphisms and Substitutions

String morphisms are systematic transformations that map symbols to strings according to fixed rules. These transformations preserve the structure of concatenation and provide powerful tools for generating complex sequences and analyzing language properties.

Definition: String Homomorphism

A string homomorphism is a function h: Σ* → Δ* such that:

  1. h(ε) = ε
  2. h(xy) = h(x)h(y) for all x, y ∈ Σ*

A homomorphism is completely determined by its values on the alphabet: if h(a)is defined for each a ∈ Σ, then for any string w = a1a2...an:

h(w) = h(a1)h(a2)...h(an)

Examples of String Homomorphisms

Example 1: Binary Duplication

Define h: {a,b}* → {a,b}* by:

  • h(a) = aa
  • h(b) = bb

Applications:

  • h(ab) = h(a)h(b) = aa·bb = aabb
  • h(aba) = aa·bb·aa = aabbaа
  • h(ε) = ε

Example 2: Encoding Homomorphism

Define g: {a,b,c}* → {0,1}* by:

  • g(a) = 01
  • g(b) = 10
  • g(c) = 00

Applications:

  • g(abc) = 01·10·00 = 011000
  • g(cab) = 00·01·10 = 000110

Example 3: Thue-Morse Generator

Define t: {0,1}* → {0,1}* by:

  • t(0) = 01
  • t(1) = 10

Iteration generates the Thue-Morse sequence:

  • t0(0) = 0
  • t1(0) = 01
  • t2(0) = 0110
  • t3(0) = 01101001

Theorem: Properties of String Homomorphisms

For string homomorphisms h: Σ* → Δ* and g: Δ* → Γ*:

  1. Composition: g ∘ h: Σ* → Γ* is a homomorphism
  2. Length: |h(w)| = Σa∈Σ |w|a · |h(a)|, where |w|a is the number of occurrences of a in w
  3. Reversal: h(wR) = (h(w))R if and only if h(a)R = h(a) for all a ∈ Σ
  4. Iteration: hn(w) = h(hn-1(w)) for n ≥ 1

Definition: String Substitution

A string substitution is a function σ: Σ → P(Δ*) that assigns to each symbola ∈ Σ a language σ(a) ⊆ Δ*.

The substitution extends to strings by:

  • σ(ε) = {ε}
  • σ(wa) = σ(w) · σ(a) for w ∈ Σ*, a ∈ Σ

And to languages by: σ(L) = ⋃w∈L σ(w)

Proof: Proof of Homomorphism Length Formula

We prove by induction on the length of w that|h(w)| = Σa∈Σ |w|a · |h(a)|.

Base case: w = ε
|h(ε)| = |ε| = 0 and Σa∈Σ |ε|a · |h(a)| = Σa∈Σ 0 · |h(a)| = 0

Inductive step: Assume the formula holds for string w. Consider wa for some a ∈ Σ.

|h(wa)| = |h(w)h(a)| = |h(w)| + |h(a)|

By the induction hypothesis:
|h(w)| = Σb∈Σ |w|b · |h(b)|

Now, |wa|b = |w|b for b ≠ a, and |wa|a = |w|a + 1.

Therefore:
Σb∈Σ |wa|b · |h(b)| = Σb≠a |w|b · |h(b)| + (|w|a + 1) · |h(a)|
= Σb∈Σ |w|b · |h(b)| + |h(a)| = |h(w)| + |h(a)| = |h(wa)|

Theorem: Fixed Points and Morphic Sequences

A string w is a fixed point of homomorphism hif h(w) = w.

An infinite sequence s = s0s1s2... is morphicif there exists a homomorphism h and a symbol a such thats = \lim_{n→∞} hn(a).

Many famous sequences are morphic, including the Thue-Morse sequence, Fibonacci word, and dragon curve sequence.

Worked Example: Fibonacci Word Generation

Fibonacci Homomorphism:

Define f: {a,b}* → {a,b}* by:

  • f(a) = ab
  • f(b) = a

Iteration starting from a:

  • f0(a) = a
  • f1(a) = ab
  • f2(a) = f(ab) = f(a)f(b) = ab·a = aba
  • f3(a) = f(aba) = ab·a·ab = abaab
  • f4(a) = f(abaab) = ab·a·ab·ab·a = abaababa

Length Analysis:

  • Lengths: 1, 2, 3, 5, 8, 13, 21, ... (Fibonacci numbers!)
  • Each step: |fn+1(a)| = |fn(a)| + |fn-1(a)|
  • This follows from the recurrence structure of the homomorphism

Properties of the Fibonacci Word:

  • Infinite limit: abaababaabaababaababaabaab...
  • Non-periodic (no finite substring repeats infinitely)
  • Contains no cube (no substring of form xxx)
  • Optimal for avoiding repetitive patterns

Language Growth and Density

Languages can be classified by how many strings they contain at each length and how these counts grow with length. This combinatorial perspective reveals deep structural properties and provides tools for comparing the complexity of different language classes.

Definition: Growth Function and Density

For a language L ⊆ Σ*, define:

  • Growth function: gL(n) = |L ∩ Σn|(number of strings of length n in L)
  • Cumulative growth: GL(n) = Σi=0n gL(i)(total number of strings of length ≤ n in L)
  • Density function: dL(n) = gL(n) / |Σ|n(fraction of strings of length n that are in L)

Examples of Language Growth

Example 1: Even-length strings over {a,b}

L1 = {w ∈ {a,b}* | |w| is even}

  • gL₁(n) = 2n if n is even, 0 if n is odd
  • dL₁(n) = 1 if n is even, 0 if n is odd
  • Cumulative: GL₁(n) = 1 + 4 + 16 + ... = (4⌊n/2⌋+1 - 1)/3

Example 2: Palindromes over {a,b}

L₂ = {w ∈ {a,b}* | w = wR}

  • gL₂(n) = 2⌈n/2⌉ (choose first half, second half determined)
  • dL₂(n) = 2⌈n/2⌉ / 2n = 2⌈n/2⌉-n
  • Density approaches 0 as n → ∞ (palindromes become rare)

Example 3: Strings with equal a and b

L₃ = {w ∈ {a,b}* | |w|a = |w|b}

  • gL₃(n) = C(n, n/2) if n is even, 0 if n is odd
  • For even n: dL₃(n) ≈ √(2/(πn)) (by Stirling's approximation)
  • Density decreases like 1/√n

Theorem: Growth Rate Classification

Languages can be classified by their growth rates:

  1. Polynomial growth: GL(n) = O(nk) for some k
    Examples: finite languages, length-bounded languages
  2. Exponential growth: GL(n) = Θ(cn) for some c > 1
    Examples: all strings over an alphabet, regular languages with cycles
  3. Sub-exponential growth: GL(n) grows faster than polynomial but slower than exponential
    Examples: some context-free languages, certain number-theoretic languages

Theorem: Density and Measure Theory

For languages over alphabet Σ with |Σ| ≥ 2:

  1. Zero density: limn→∞ dL(n) = 0(asymptotically, almost no strings are in L)
  2. Positive density: liminfn→∞ dL(n) > 0(infinitely often, a positive fraction of strings are in L)
  3. Full density: limn→∞ dL(n) = 1(asymptotically, almost all strings are in L)

Most "interesting" languages have zero density - they become increasingly rare among all possible strings.

Proof: Proof that Context-Free Languages Have Zero Density

We prove that any infinite context-free language over alphabet Σ with |Σ| ≥ 2has density approaching 0.

Let L be an infinite context-free language with grammar G in Chomsky Normal Form. Every string w ∈ L of length n has a derivation tree with exactly n leaves and at most 2n-1 internal nodes.

The number of such derivation trees is bounded by |V|2n-1, where Vis the set of nonterminals in G. Since each tree corresponds to at most one string, we have gL(n) ≤ |V|2n-1.

Therefore:
dL(n) = gL(n) / |Σ|n ≤ |V|2n-1 / |Σ|n = |V|-1 · (|V|2 / |Σ|)n

Since |V| is finite and |Σ| ≥ 2, we have |V|2 / |Σ| < |Σ|for sufficiently large |Σ|. Even when |V|2 / |Σ| ≥ 1, the exponential term is bounded by a constant.

More precisely, using the Pumping Lemma and careful analysis of the structure of context-free derivations, one can show that dL(n) = o(1), i.e., limn→∞ dL(n) = 0.

Worked Example: Growth Analysis of Regular Language

Language: Strings over {a,b} that don't contain aa

Let L = {w ∈ {a,b}* | w does not contain aa}

Recurrence Analysis:

Let f(n) = number of valid strings of length n

  • f(0) = 1 (empty string)
  • f(1) = 2 (strings: a, b)
  • For n ≥ 2: f(n) = f(n-1) + f(n-2)

Explanation of recurrence:

  • If string ends with b: previous n-1 symbols can be any valid string → f(n-1) choices
  • If string ends with a: previous symbol must be b, so last two symbols are baf(n-2) choices

Solution:

This is the Fibonacci recurrence! f(n) = Fn+2

  • Values: 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
  • Asymptotic formula: f(n) ≈ φn+2/√5 where φ = (1+√5)/2 ≈ 1.618

Density Analysis:

  • dL(n) = f(n) / 2n ≈ (φn+2/√5) / 2n = (φ2/√5) · (φ/2)n
  • Since φ/2 ≈ 0.809 < 1, we have dL(n) → 0
  • The language has zero density despite being infinite and regular

Combinatorial String Theory

Theorem: String Enumeration and Counting

For an alphabet Σ with |Σ| = k:

  1. The number of strings of length n over Σ is kn
  2. The number of strings of length ≤ n over Σ is Σi=0n ki = (kn+1 - 1)/(k - 1) for k > 1
  3. The number of non-empty strings over Σ of length ≤ n is Σi=1n ki = (kn+1 - k)/(k - 1) for k > 1

Proof: Proof of String Counting Formula

To construct a string of length n over an alphabet Σ with |Σ| = k:

For the 1st position, we have k choices from Σ. For the 2nd position, we have k choices from Σ. ... For the nth position, we have k choices from Σ.

By the multiplication principle of combinatorics, the total number of possible strings is k × k × ... × k (n times) = kn.

For the number of strings of length ≤ n, we sum the number of strings of each length from 0 to n:

Σi=0n ki = 1 + k + k2 + ... + kn

This is a geometric series with first term a = 1 and common ratio r = k. Using the formula for the sum of a geometric series:

Σi=0n ki = (1 - kn+1)/(1 - k) = (kn+1 - 1)/(k - 1) for k > 1

Theorem: Growth Rate of String Sets

Let Σ be an alphabet with |Σ| = k ≥ 2, and let Σn be the set of all strings of length n over Σ. Then:

  1. n| = kn, which grows exponentially with n
  2. The limit as n approaches infinity of n|/|Σn-1| = k
  3. For any infinite language L ⊆ Σ*, the function f(n) that counts strings in L of length n must satisfy f(n) ≤ kn

Summary

  • Σ: A finite, non-empty set of symbols (alphabet)
  • Σ*: Set of all finite strings over Σ, including ε
  • ε: The empty string (length 0)
  • String: A finite sequence of symbols from the alphabet
  • |s|: Length of string s (number of symbols)
  • Language: Any subset of Σ* (a set of strings)
  • String Operations: Concatenation (s·t), Reversal (sR)
  • Language Operations:
    Union (L₁∪L₂), Intersection (L₁∩L₂), Concatenation (L₁L₂), Kleene Star (L*),Kleene Plus (L⁺), Complement (L̄)
  • Fundamental Theorems: Cardinality of Σ* (countable) and P(Σ*) (uncountable), distributive properties of operations
  • String Counting: With alphabet size k, there are kn strings of length n

Suggested Reading

  • Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation – Chapter 1.5: Strings and Languages
  • Sipser, M. Introduction to the Theory of Computation – Chapter 0: Introduction
  • Kozen, D. Automata and Computability – Chapter 1: Languages and Strings
  • Lewis, H. and Papadimitriou, C. Elements of the Theory of Computation – Chapter 1.1-1.2: Mathematical Preliminaries