An alphabet is a finite, non-empty set of symbols, typically denoted by Σ
(Sigma).
Alphabets, Strings, and Languages
Before diving into automata theory, we need to establish the fundamental concepts of alphabets, strings, and languages. These form the basis for understanding all models of computation that we'll study.
Alphabets
Definition: Alphabet
Examples of alphabets include:
Σ
={a,b}
- A binary alphabetΣ
={0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
- The set of decimal digitsΣ
={a, b, c, ..., z}
- The lowercase English alphabet
Strings
Definition: String
Given an alphabet Σ
, a string over Σ
is a finite sequence of symbols from Σ
. The set of all strings over Σ
is denoted as Σ*
.
The empty string, denoted by ε
(epsilon), is the string containing no symbols.
Examples of strings over the alphabet Σ
= {a, b}
:
ε
- The empty stringa
- A string of length 1ab
- A string of length 2aaba
- A string of length 4bbbbb
- A string of length 5
String Properties and Operations
String Length
The length of a string s
, denoted |s|
, is the number of symbols in the string. For example, |abba|
= 4 and |ε|
= 0.
Theorem: Fundamental String Properties
For any strings x, y, z ∈ Σ*
:
|xy| = |x| + |y|
|xR| = |x|
|ε| = 0
- For any
k ∈ ℕ
,|xk| = k·|x|
(xy)R = yRxR
Proof: Proof of (xy)ʳ = yʳxʳ
Let x = a1a2...an
and y = b1b2...bm
where ai, bj ∈ Σ
.
Then xy = a1a2...anb1b2...bm
The reversal of xy
is (xy)R = bm...b2b1an...a2a1
We also have xR = an...a2a1
and yR = bm...b2b1
Therefore yRxR = bm...b2b1an...a2a1 = (xy)R
Theorem: Prefix and Suffix Properties
For any string w ∈ Σ*
, define:
Pref(w) = {x ∈ Σ* | ∃y ∈ Σ*, w = xy}
(the set of all prefixes of w)Suff(w) = {y ∈ Σ* | ∃x ∈ Σ*, w = xy}
(the set of all suffixes of w)
Then the following properties hold:
|Pref(w)| = |Suff(w)| = |w| + 1
- For any
w
with|w| > 0
,w ∈ Pref(w) ∩ Suff(w)
ε ∈ Pref(w) ∩ Suff(w)
for allw ∈ Σ*
String Concatenation
The concatenation of two strings s
and t
, denoted s·t
or simply st
, is the string formed by appending t
to the end of s
.
For example, if s
= ab
and t
= ba
, then st
= abba
.
String Reversal
The reversal of a string s
, denoted sR
, is the string formed by writing the symbols of s
in reverse order.
For example, if s
= abc
, then sR
= cba
.
String Combinatorics and Periodicity Theory
Beyond basic operations, strings exhibit deep mathematical structure related to repetition, periodicity, and combinatorial properties. These concepts form the foundation for understanding pattern matching, string algorithms, and formal language complexity.
Definition: Primitive Strings and Periods
A string w ∈ Σ*
is called primitive if w ≠ uk
for any string u ∈ Σ*
and integer k ≥ 2
.
A period of string w
is a positive integer p
such thatw[i] = w[i + p]
for all valid indices i
. The minimal period of w
is the smallest period of w
.
A string w
has period p
if there exists a stringu
of length p
such that w
is a prefix of u∞
(the infinite repetition of u
).
Examples of Primitive Strings and Periods
Primitive Strings:
abc
- Cannot be written as repetition of shorter stringabcd
- Primitiveabcde
- Primitive
Non-primitive Strings:
abab
=(ab)²
- Repetition ofab
abcabc
=(abc)²
- Repetition ofabc
aaaa
=a⁴
- Repetition ofa
Period Analysis:
abababab
- Periods: 2, 4, 6, 8. Minimal period: 2abcabcabc
- Periods: 3, 6, 9. Minimal period: 3aaaa
- Periods: 1, 2, 3, 4. Minimal period: 1abcd
- Only period: 4 (minimal period equals length)
Theorem: Primitive String Decomposition
Every non-empty string w ∈ Σ*
can be uniquely written as w = uk
where u
is primitive and k ≥ 1
.
The string u
is called the primitive root of w
, and k
is the exponent of the decomposition.
Proof: Proof of Primitive String Decomposition
Existence: Let w
be a non-empty string and let p
be its minimal period. Define u = w[1..p]
(the prefix of w
of length p
).
Since p
is a period of w
, we have w[i] = w[i + p]
for all valid i
. This means w
is a prefix of the infinite string u∞
.
Therefore, w = ukv
where k = ⌊|w|/p⌋
andv
is a proper prefix of u
(possibly empty).
If v ≠ ε
, then w
would have a period smaller than p
, contradicting the minimality of p
. Therefore, v = ε
and w = uk
.
We must show that u
is primitive. Suppose u = sj
for somes
and j ≥ 2
. Then w = (sj)k = sjk
, which means w
has period |s| = |u|/j < |u| = p
, contradicting the minimality of p
.
Uniqueness: Suppose w = uk = vℓ
where both u
andv
are primitive. Then |u|
and |v|
are both periods of w
.
Since u
and v
are primitive, they have minimal periods equal to their lengths. By the minimality of period, we must have |u| = |v|
, and therefore u = v
and k = ℓ
.
Theorem: Fine-Wilf Theorem
Let w
be a string with periods p
and q
. If |w| ≥ p + q - gcd(p,q)
, then w
also has period gcd(p,q)
.
In particular, if gcd(p,q) = 1
and |w| ≥ p + q - 1
, then w
has period 1 (i.e., all symbols in w
are identical).
Proof: Proof of Fine-Wilf Theorem
Let d = gcd(p,q)
. We will show that w
has period d
by proving that w[i] = w[i + d]
for all valid indices i
.
Since d = gcd(p,q)
, there exist integers a, b
such that d = ap + bq
(by Bézout's identity).
For any index i
such that both i
and i + d
are valid positions in w
, we need to show w[i] = w[i + d]
.
We have i + d = i + ap + bq
. Using the fact that w
has periods p
and q
:
- If
a ≥ 0
, thenw[i] = w[i + ap]
(applying periodp
a total ofa
times) - If
a < 0
, thenw[i + ap] = w[i]
(applying periodp
backwards)
Similarly for the term bq
. The key insight is that we can "reach" position i + d
from position i
by a sequence of steps of size p
and q
, and each step preserves the character due to the periodicity.
The condition |w| ≥ p + q - d
ensures that all intermediate positions in this sequence are valid indices in the string, allowing us to complete the chain of equalities.
Therefore, w[i] = w[i + d]
for all valid i
, proving that w
has period d
.
Worked Example: Fine-Wilf Theorem Application
Example 1: String with periods 6 and 10
Consider a string w
with periods 6 and 10.
gcd(6, 10) = 2
- Fine-Wilf threshold:
6 + 10 - 2 = 14
- If
|w| ≥ 14
, thenw
also has period 2
Example 2: String with coprime periods
Consider a string w
with periods 3 and 5.
gcd(3, 5) = 1
- Fine-Wilf threshold:
3 + 5 - 1 = 7
- If
|w| ≥ 7
, thenw
has period 1 - This means
w = ak
for some symbola
Concrete Example:
String: abcabcabc
(length 9)
- Has period 3:
abc|abc|abc
- Does NOT have period 5 (check: positions 1 and 6 have different characters)
- So Fine-Wilf doesn't apply - it's not true that every long string has multiple periods
Another Example:
String: aaaaaaa
(length 7)
- Has period 1, period 2, period 3, ..., period 7
- Specifically has periods 3 and 5
- Length 7 ≥ 3 + 5 - 1 = 7, so Fine-Wilf applies
- Indeed, has period
gcd(3,5) = 1
✓
Definition: Border of a String
A border of a string w
is a string u
that is both a proper prefix and a proper suffix of w
.
The border array B[1..n]
of a string w[1..n]
is defined as:B[i] =
length of the longest border of w[1..i]
.
A string w
is border-free (or unbordered) if it has no borders.
Theorem: Relationship Between Borders and Periods
For a string w
of length n
:
w
has a border of lengthk
if and only ifw
has periodn - k
- The shortest period of
w
isn -
(length of longest border) - If
w
has periodp
, thenw
has a border of lengthn - p
Proof: Proof of Border-Period Relationship
Let w
be a string of length n
.
(⇒) Suppose w
has a border u
of length k
. Then w = uv = vu
for some string v
of length n - k
.
From uv = vu
, we get that for any position i ≤ n - (n-k) = k
:w[i] = u[i] = v[i] = w[i + (n-k)]
.
For positions i > k
, we have w[i] = v[i-k]
andw[i + (n-k)] = u[i + (n-k) - k] = u[i - k] = w[i - k]
, which by the periodicity already established equals w[i]
.
Therefore, w[i] = w[i + (n-k)]
for all valid i
, so w
has period n - k
.
(⇐) Suppose w
has period p = n - k
. Then w[i] = w[i + p]
for all valid i
.
Define u = w[1..k]
. We need to show that u
is also a suffix of w
.
For any j ∈ [1,k]
, we have:w[n-k+j] = w[(n-k+j) - p] = w[j]
(using period p
).
Therefore, w[n-k+1..n] = w[1..k] = u
, so u
is both a prefix and suffix of w
, making it a border of length k
.
Worked Example: Border Analysis
String: abcabcab
(length 8)
Finding Borders:
- Prefixes:
a, ab, abc, abca, abcab, abcabc, abcabca
- Suffixes:
b, ab, cab, bcab, cabcab, bcabcab, cabcabcab
- Common:
ab, abcab
- Borders:
ab
(length 2),abcab
(length 5)
Corresponding Periods:
- Border length 2 ⇒ Period
8 - 2 = 6
- Border length 5 ⇒ Period
8 - 5 = 3
- Shortest period: 3 (from longest border)
Verification:
- Period 3:
abc|abc|ab
✓ (pattern repeats every 3) - Period 6:
abcabc|ab
✓ (pattern repeats every 6)
Border Array: [0, 0, 0, 1, 2, 3, 4, 5]
Each position shows the length of the longest border of the prefix ending at that position.
Languages
Definition: Language
A language L over an alphabet Σ
is a subset of Σ*
(i.e., L ⊆ Σ*
). In other words, a language is a set of strings over the alphabet.
Examples of languages over the alphabet Σ
= {a, b}
:
L₁
={ε}
- The language containing only the empty stringL₂
={a, aa, aaa, ...}
- The language of all strings consisting only ofa
L₃
={ε, a, b, aa, ab, ba, bb, ...}
- The language of all strings (Σ*
)L₄
={aa, ab, ba, bb, aaa, ...}
- The language of all strings of length ≥ 2L₅
={ε, ab, abab, ababab, ...}
- The language of strings consisting of zero or more repetitions ofab
Theorem: Cardinality of Languages
For any finite alphabet Σ
with |Σ| ≥ 1
:
Σ*
is countably infinite- The set of all languages over
Σ
, denoted asP(Σ*)
, is uncountable
Proof: Proof of the Uncountability of Language Set
We will prove that P(Σ*)
is uncountable using diagonalization.
Assume, for contradiction, that P(Σ*)
is countable. Then we can enumerate all languages as L₁, L₂, L₃, ...
Since Σ*
is countable, we can also enumerate all strings in Σ*
as s₁, s₂, s₃, ...
Now, define a new language L'
as follows:
si ∈ L'
if and only if si ∉ Li
This language L'
must differ from every language in our enumeration:
L'
differs fromL₁
on strings₁
L'
differs fromL₂
on strings₂
- ...
L'
differs fromLn
on stringsn
Therefore, L'
cannot be in our enumeration of all languages, which contradicts our assumption that we listed all languages. Hence, P(Σ*)
must be uncountable.
Operations on Languages
Since languages are sets, we can perform the usual set operations on them: union, intersection, and complement. Additionally, there are operations specific to languages:
Concatenation
The concatenation of languages L₁
and L₂
, denoted L₁·L₂
or L₁L₂
, is the set{xy | x ∈ L₁ and y ∈ L₂}
.
Kleene Star
The Kleene star of a language L
, denoted L*
, is the set of all strings that can be formed by concatenating zero or more strings from L
. Formally:
L* = {ε} ∪ L ∪ LL ∪ LLL ∪ ...
Kleene Plus
The Kleene plus of a language L
, denoted L⁺
, is the set of all strings that can be formed by concatenating one or more strings from L
. Formally:
L⁺ = L ∪ LL ∪ LLL ∪ ...
The key difference between Kleene star and Kleene plus is that L*
always includes the empty string ε
, whereas L⁺
only includes ε
if ε ∈ L
. In other words, L⁺ = LL*
.
Theorem: Properties of Language Operations
For languages A, B, C ⊆ Σ*
:
- Distributivity:
A(B∪C) = AB∪AC
and(B∪C)A = BA∪CA
- Reversal:
(A∪B)R = AR∪BR
and(AB)R = BRAR
- Complementation:
(A∪B)' = A'∩B'
and(A∩B)' = A'∪B'
Theorem: Properties of Kleene Operations
For any languages A, B ⊆ Σ*
:
(A*)* = A*
A*A* = A*
A⁺ = AA*
(A∪B)* ≠ A*B*
in general(A*)R = (AR)*
Proof: Proof of A*A* = A*
We need to show that A*A* ⊆ A*
and A* ⊆ A*A*
.
A* ⊆ A*A*: Let w ∈ A*
. We can write w = w·ε
where w ∈ A*
and ε ∈ A*
(since ε ∈ A*
by definition of Kleene star). Thus w ∈ A*A*
.
A*A* ⊆ A*: Let w ∈ A*A*
. Then w = uv
for some u, v ∈ A*
.
By definition of A*
, u
is a concatenation of zero or more strings from A
: u = u₁u₂...un
where each ui ∈ A
.
Similarly, v
is a concatenation of zero or more strings from A
: v = v₁v₂...vm
where each vj ∈ A
.
Therefore, w = uv = u₁u₂...unv₁v₂...vm
is also a concatenation of zero or more strings from A
, which means w ∈ A*
.
Since A* ⊆ A*A*
and A*A* ⊆ A*
, we conclude that A*A* = A*
.
Language Quotient Operations
Beyond the standard set operations and Kleene closure, languages support quotient operations that "divide" one language by another. These operations are fundamental in formal language theory and provide insights into the structure of regular and context-free languages.
Definition: Left and Right Quotients
Given languages A, B ⊆ Σ*
:
- The left quotient of
A
byB
is:B\A = {w ∈ Σ* | ∃u ∈ B, uw ∈ A}
- The right quotient of
A
byB
is:A/B = {w ∈ Σ* | ∃u ∈ B, wu ∈ A}
When B = {u}
is a singleton, we write u\A
and A/u
for the quotients by the single string u
.
Examples of Quotient Operations
Example 1: Simple Quotients
Let A = {ab, abc, bc}
and B = {a, b}
Left Quotient B\A:
- For
u = a
: strings in A starting witha
are{ab, abc}
→ remainders{b, bc}
- For
u = b
: strings in A starting withb
are{bc}
→ remainder{c}
- Result:
B\A = {b, bc, c}
Right Quotient A/B:
- For
u = a
: no strings in A end witha
→ no remainders - For
u = b
: strings in A ending withb
are{ab}
→ remainder{a}
- Result:
A/B = {a}
Example 2: Quotient by Single String
Let L = {abc, abcd, def, abef}
- ab\L: Remove
ab
from left →{c, cd, ef}
- L/ef: Remove
ef
from right →{d, ab}
- xyz\L: No strings start with
xyz
→∅
Theorem: Basic Properties of Quotient Operations
For languages A, B, C ⊆ Σ*
and strings u, v ∈ Σ*
:
- Empty quotients:
∅\A = A/∅ = ∅
- Epsilon quotients:
ε\A = A/ε = A
- Distributivity over union:
B\(A₁ ∪ A₂) = (B\A₁) ∪ (B\A₂)
- Associativity:
u\(v\A) = (uv)\A
and(A/u)/v = A/(vu)
- Quotient-concatenation relation:
u ∈ v\(uA)
if and only ifv
is a prefix ofu
Proof: Proof of Quotient-Concatenation Relationship
We prove that B\(AB) ⊇ B
for any languages A, B
.
Let w ∈ B
. We need to show that w ∈ B\(AB)
, i.e., there exists u ∈ B
such that uw ∈ AB
.
Choose u = w ∈ B
. Then uw = ww
. If ε ∈ A
(empty string is in A), then ww = w·w ∈ AB
since w ∈ B
.
More generally, for any a ∈ A
, we have wa ∈ AB
. If we can write w = ua
for some u ∈ B
and a ∈ A
, then uw = u(ua) = (uu)a
.
The complete relationship is: B\(AB) = {w ∈ Σ* | Bw ∩ AB ≠ ∅}
, where Bw = {uw | u ∈ B}
.
This shows that quotient operations interact with concatenation in a way that reflects the "cancellation" property of string division, albeit generalized to language operations.
Theorem: Quotients and Regular Languages
If A
is a regular language and B
is any language, then:
B\A
andA/B
are regular languages- If
B
is also regular, thenB\A
andA/B
are regular - The quotient of a regular language by a single string is always regular
This closure property is fundamental to the theory of regular languages and is used extensively in automata constructions and language recognition algorithms.
Worked Example: Quotients in Regular Languages
Language: L = {strings over {a,b} ending with 'ab'}
In regex notation: (a|b)*ab
Right Quotient L/{{b}}:
Remove b
from the right side of strings in L
- Strings in
L
:ab, aab, bab, aaab, abab, baab, ...
- All end with
ab
, so removingb
gives strings ending witha
- Result:
L/{b} = {strings ending with 'a'}
- In regex:
(a|b)*a
Right Quotient L/{{ab}}:
Remove ab
from the right side of strings in L
- Every string in
L
ends withab
- Removing
ab
gives all possible prefixes - Result:
L/{ab} = (a|b)*
(all strings)
Left Quotient {{a}}\L:
Remove a
from the left side of strings in L
- Strings in
L
starting witha
:ab, aab, aaab, abab, ...
- After removing
a
:b, ab, aab, bab, ...
- These are exactly strings ending with
b
(including justb
) - Result:
{a}\L = {strings ending with 'b'}
- In regex:
(a|b)*b
Verification of Regularity:
All quotient results are expressible as regular expressions, confirming that regular languages are closed under quotient operations.
Theorem: Quotient Hierarchy and Iteration
For languages A, B ⊆ Σ*
, define the sequence of iterated quotients:
A0 = A
An+1 = B\An
forn ≥ 0
Then:
- The sequence
(An)n≥0
is eventually periodic - If
A
is regular, then eachAn
is regular ⋃n≥0 An
represents the "quotient closure" ofA
underB
Applications of Quotient Operations
1. String Pattern Matching
Quotients are used in pattern matching algorithms to represent "what remains to be matched" after processing part of a pattern.
2. Automata Construction
States in finite automata can be viewed as representing quotient languages - each state corresponds to the set of strings that lead to acceptance from that state.
3. Language Equation Solving
Quotients help solve language equations of the form AX = B
(find language X
such that concatenating A
with X
gives B
).
4. Formal Language Decision Problems
Many decision problems about regular languages can be reduced to computing and analyzing quotient languages.
String Morphisms and Substitutions
String morphisms are systematic transformations that map symbols to strings according to fixed rules. These transformations preserve the structure of concatenation and provide powerful tools for generating complex sequences and analyzing language properties.
Definition: String Homomorphism
A string homomorphism is a function h: Σ* → Δ*
such that:
h(ε) = ε
h(xy) = h(x)h(y)
for allx, y ∈ Σ*
A homomorphism is completely determined by its values on the alphabet: if h(a)
is defined for each a ∈ Σ
, then for any string w = a1a2...an
:
h(w) = h(a1)h(a2)...h(an)
Examples of String Homomorphisms
Example 1: Binary Duplication
Define h: {a,b}* → {a,b}*
by:
h(a) = aa
h(b) = bb
Applications:
h(ab) = h(a)h(b) = aa·bb = aabb
h(aba) = aa·bb·aa = aabbaа
h(ε) = ε
Example 2: Encoding Homomorphism
Define g: {a,b,c}* → {0,1}*
by:
g(a) = 01
g(b) = 10
g(c) = 00
Applications:
g(abc) = 01·10·00 = 011000
g(cab) = 00·01·10 = 000110
Example 3: Thue-Morse Generator
Define t: {0,1}* → {0,1}*
by:
t(0) = 01
t(1) = 10
Iteration generates the Thue-Morse sequence:
t0(0) = 0
t1(0) = 01
t2(0) = 0110
t3(0) = 01101001
Theorem: Properties of String Homomorphisms
For string homomorphisms h: Σ* → Δ*
and g: Δ* → Γ*
:
- Composition:
g ∘ h: Σ* → Γ*
is a homomorphism - Length:
|h(w)| = Σa∈Σ |w|a · |h(a)|
, where|w|a
is the number of occurrences ofa
inw
- Reversal:
h(wR) = (h(w))R
if and only ifh(a)R = h(a)
for alla ∈ Σ
- Iteration:
hn(w) = h(hn-1(w))
forn ≥ 1
Definition: String Substitution
A string substitution is a function σ: Σ → P(Δ*)
that assigns to each symbola ∈ Σ
a language σ(a) ⊆ Δ*
.
The substitution extends to strings by:
σ(ε) = {ε}
σ(wa) = σ(w) · σ(a)
forw ∈ Σ*, a ∈ Σ
And to languages by: σ(L) = ⋃w∈L σ(w)
Proof: Proof of Homomorphism Length Formula
We prove by induction on the length of w
that|h(w)| = Σa∈Σ |w|a · |h(a)|
.
Base case: w = ε
|h(ε)| = |ε| = 0
and Σa∈Σ |ε|a · |h(a)| = Σa∈Σ 0 · |h(a)| = 0
✓
Inductive step: Assume the formula holds for string w
. Consider wa
for some a ∈ Σ
.
|h(wa)| = |h(w)h(a)| = |h(w)| + |h(a)|
By the induction hypothesis:|h(w)| = Σb∈Σ |w|b · |h(b)|
Now, |wa|b = |w|b
for b ≠ a
, and |wa|a = |w|a + 1
.
Therefore:Σb∈Σ |wa|b · |h(b)| = Σb≠a |w|b · |h(b)| + (|w|a + 1) · |h(a)|
= Σb∈Σ |w|b · |h(b)| + |h(a)| = |h(w)| + |h(a)| = |h(wa)|
✓
Theorem: Fixed Points and Morphic Sequences
A string w
is a fixed point of homomorphism h
if h(w) = w
.
An infinite sequence s = s0s1s2...
is morphicif there exists a homomorphism h
and a symbol a
such thats = \lim_{n→∞} hn(a)
.
Many famous sequences are morphic, including the Thue-Morse sequence, Fibonacci word, and dragon curve sequence.
Worked Example: Fibonacci Word Generation
Fibonacci Homomorphism:
Define f: {a,b}* → {a,b}*
by:
f(a) = ab
f(b) = a
Iteration starting from a
:
f0(a) = a
f1(a) = ab
f2(a) = f(ab) = f(a)f(b) = ab·a = aba
f3(a) = f(aba) = ab·a·ab = abaab
f4(a) = f(abaab) = ab·a·ab·ab·a = abaababa
Length Analysis:
- Lengths: 1, 2, 3, 5, 8, 13, 21, ... (Fibonacci numbers!)
- Each step:
|fn+1(a)| = |fn(a)| + |fn-1(a)|
- This follows from the recurrence structure of the homomorphism
Properties of the Fibonacci Word:
- Infinite limit:
abaababaabaababaababaabaab...
- Non-periodic (no finite substring repeats infinitely)
- Contains no cube (no substring of form
xxx
) - Optimal for avoiding repetitive patterns
Language Growth and Density
Languages can be classified by how many strings they contain at each length and how these counts grow with length. This combinatorial perspective reveals deep structural properties and provides tools for comparing the complexity of different language classes.
Definition: Growth Function and Density
For a language L ⊆ Σ*
, define:
- Growth function:
gL(n) = |L ∩ Σn|
(number of strings of lengthn
inL
) - Cumulative growth:
GL(n) = Σi=0n gL(i)
(total number of strings of length ≤n
inL
) - Density function:
dL(n) = gL(n) / |Σ|n
(fraction of strings of lengthn
that are inL
)
Examples of Language Growth
Example 1: Even-length strings over {a,b}
L1 = {w ∈ {a,b}* | |w| is even}
gL₁(n) = 2n
ifn
is even,0
ifn
is odddL₁(n) = 1
ifn
is even,0
ifn
is odd- Cumulative:
GL₁(n) = 1 + 4 + 16 + ... = (4⌊n/2⌋+1 - 1)/3
Example 2: Palindromes over {a,b}
L₂ = {w ∈ {a,b}* | w = wR}
gL₂(n) = 2⌈n/2⌉
(choose first half, second half determined)dL₂(n) = 2⌈n/2⌉ / 2n = 2⌈n/2⌉-n
- Density approaches 0 as
n → ∞
(palindromes become rare)
Example 3: Strings with equal a
and b
L₃ = {w ∈ {a,b}* | |w|a = |w|b}
gL₃(n) = C(n, n/2)
ifn
is even,0
ifn
is odd- For even
n
:dL₃(n) ≈ √(2/(πn))
(by Stirling's approximation) - Density decreases like
1/√n
Theorem: Growth Rate Classification
Languages can be classified by their growth rates:
- Polynomial growth:
GL(n) = O(nk)
for somek
Examples: finite languages, length-bounded languages - Exponential growth:
GL(n) = Θ(cn)
for somec > 1
Examples: all strings over an alphabet, regular languages with cycles - Sub-exponential growth:
GL(n)
grows faster than polynomial but slower than exponential
Examples: some context-free languages, certain number-theoretic languages
Theorem: Density and Measure Theory
For languages over alphabet Σ
with |Σ| ≥ 2
:
- Zero density:
limn→∞ dL(n) = 0
(asymptotically, almost no strings are inL
) - Positive density:
liminfn→∞ dL(n) > 0
(infinitely often, a positive fraction of strings are inL
) - Full density:
limn→∞ dL(n) = 1
(asymptotically, almost all strings are inL
)
Most "interesting" languages have zero density - they become increasingly rare among all possible strings.
Proof: Proof that Context-Free Languages Have Zero Density
We prove that any infinite context-free language over alphabet Σ
with |Σ| ≥ 2
has density approaching 0.
Let L
be an infinite context-free language with grammar G
in Chomsky Normal Form. Every string w ∈ L
of length n
has a derivation tree with exactly n
leaves and at most 2n-1
internal nodes.
The number of such derivation trees is bounded by |V|2n-1
, where V
is the set of nonterminals in G
. Since each tree corresponds to at most one string, we have gL(n) ≤ |V|2n-1
.
Therefore:dL(n) = gL(n) / |Σ|n ≤ |V|2n-1 / |Σ|n = |V|-1 · (|V|2 / |Σ|)n
Since |V|
is finite and |Σ| ≥ 2
, we have |V|2 / |Σ| < |Σ|
for sufficiently large |Σ|
. Even when |V|2 / |Σ| ≥ 1
, the exponential term is bounded by a constant.
More precisely, using the Pumping Lemma and careful analysis of the structure of context-free derivations, one can show that dL(n) = o(1)
, i.e., limn→∞ dL(n) = 0
.
Worked Example: Growth Analysis of Regular Language
Language: Strings over {a,b}
that don't contain aa
Let L = {w ∈ {a,b}* | w does not contain aa}
Recurrence Analysis:
Let f(n)
= number of valid strings of length n
f(0) = 1
(empty string)f(1) = 2
(strings:a, b
)- For
n ≥ 2
:f(n) = f(n-1) + f(n-2)
Explanation of recurrence:
- If string ends with
b
: previousn-1
symbols can be any valid string →f(n-1)
choices - If string ends with
a
: previous symbol must beb
, so last two symbols areba
→f(n-2)
choices
Solution:
This is the Fibonacci recurrence! f(n) = Fn+2
- Values: 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
- Asymptotic formula:
f(n) ≈ φn+2/√5
whereφ = (1+√5)/2 ≈ 1.618
Density Analysis:
dL(n) = f(n) / 2n ≈ (φn+2/√5) / 2n = (φ2/√5) · (φ/2)n
- Since
φ/2 ≈ 0.809 < 1
, we havedL(n) → 0
- The language has zero density despite being infinite and regular
Combinatorial String Theory
Theorem: String Enumeration and Counting
For an alphabet Σ
with |Σ| = k
:
- The number of strings of length n over
Σ
iskn
- The number of strings of length ≤ n over
Σ
isΣi=0n ki = (kn+1 - 1)/(k - 1)
fork > 1
- The number of non-empty strings over
Σ
of length ≤ n isΣi=1n ki = (kn+1 - k)/(k - 1)
fork > 1
Proof: Proof of String Counting Formula
To construct a string of length n over an alphabet Σ
with |Σ| = k
:
For the 1st position, we have k choices from Σ
. For the 2nd position, we have k choices from Σ
. ... For the nth position, we have k choices from Σ
.
By the multiplication principle of combinatorics, the total number of possible strings is k × k × ... × k (n times) = kn
.
For the number of strings of length ≤ n, we sum the number of strings of each length from 0 to n:
Σi=0n ki = 1 + k + k2 + ... + kn
This is a geometric series with first term a = 1 and common ratio r = k. Using the formula for the sum of a geometric series:
Σi=0n ki = (1 - kn+1)/(1 - k) = (kn+1 - 1)/(k - 1)
for k > 1
Theorem: Growth Rate of String Sets
Let Σ
be an alphabet with |Σ| = k ≥ 2
, and let Σn
be the set of all strings of length n over Σ
. Then:
|Σn| = kn
, which grows exponentially with n- The limit as n approaches infinity of
|Σn|/|Σn-1| = k
- For any infinite language
L ⊆ Σ*
, the function f(n) that counts strings in L of length n must satisfyf(n) ≤ kn
Summary
Σ
: A finite, non-empty set of symbols (alphabet)Σ*
: Set of all finite strings overΣ
, includingε
ε
: The empty string (length 0)- String: A finite sequence of symbols from the alphabet
|s|
: Length of string s (number of symbols)- Language: Any subset of
Σ*
(a set of strings) - String Operations: Concatenation
(s·t)
, Reversal(sR)
- Language Operations:
Union(L₁∪L₂)
, Intersection(L₁∩L₂)
, Concatenation(L₁L₂)
, Kleene Star(L*)
,Kleene Plus(L⁺)
, Complement(L̄)
- Fundamental Theorems: Cardinality of
Σ*
(countable) andP(Σ*)
(uncountable), distributive properties of operations - String Counting: With alphabet size
k
, there arekn
strings of lengthn
Suggested Reading
- Hopcroft, Motwani, and Ullman, Introduction to Automata Theory, Languages, and Computation – Chapter 1.5: Strings and Languages
- Sipser, M. Introduction to the Theory of Computation – Chapter 0: Introduction
- Kozen, D. Automata and Computability – Chapter 1: Languages and Strings
- Lewis, H. and Papadimitriou, C. Elements of the Theory of Computation – Chapter 1.1-1.2: Mathematical Preliminaries