Let G be a context-free grammar for L in Chomsky Normal Form. Let k be the number of variables in G.
Let p = 2k and let s be any string in L such that |s| ≥ p.
Consider a parse tree for s. Since s has at least 2k symbols and every internal node in the parse tree has at most 2 children (since G is in CNF), the height of the parse tree must be at least k+1. Therefore, there must be some path from the root to a leaf with at least k+1 internal nodes.
By the pigeonhole principle, this path must contain at least one repeated variable. Let A be a variable that appears at least twice in this path, and let's denote the two occurrences as A1 and A2.
The subtree rooted at A1 contains the subtree rooted at A2. Let v be the string derived from A2, and let vwx be the string derived from A1. Thus, s can be written as uvwxy where:
u is the part of s derived from the part of the parse tree before reaching A1v is the part of s derived from the part of the parse tree between A1 and A2w is the part of s derived from A2x is the part of s derived from the part of the parse tree after A2 but still within A1y is the part of s derived from the part of the parse tree after A1
Since A1 and A2 are the same variable, we can replace the subtree rooted at A2 with the entire subtree rooted at A1, which would insert an extra copy of vwx. Similarly, we could remove the entire subtree rooted at A2 and replace it with just w.
This gives us that uviwxiy ∈ L for any i ≥ 0. Also, since the height of the parse tree is bounded by 2|s|, we know that |vwx| ≤ p. Finally, v and x can't both be empty since that would mean A1 and A2 derive the same string, which contradicts the assumption that they are distinct occurrences.