UCLA MathLing 1 On Formalizing Syntax James Rogers Slide 1 Dept. of Computer Science, Earlham College jrogers@cs.earlham.edu Formalization of Syntax Actual Lingusitic Structures? Natural (Lingusitic) Language Theory of Syntax Language as a Set of Generative Mathematical Objects Grammar Slide 2 Strings/Trees/. . . FLT Mathematical Automata
UCLA MathLing 2 Formalization of Syntax Actual Lingusitic Structures? Natural (Lingusitic) Language Theory of Syntax Language Formal as a Set of Generative Theory of Syntax Mathematical Objects Grammar (Logical Formulae) Slide 3 Strings/Trees/. . . FLT Mathematical Automata Logical Model-Theoretic Consequence Satisfaction Logical Axioms Formalization of Syntax Actual Lingusitic Structures? Natural (Lingusitic) Language Theory of Syntax Language Formal as a Set of Generative Theory of Syntax Mathematical Objects Grammar (Logical Formulae) Slide 4 Strings/Trees/. . . FLT Mathematical Automata Logical Model-Theoretic FMT Consequence Satisfaction Logical Axioms
UCLA MathLing 3 Word Models �D , ⊳, ⊳ + , P σ � σ ∈ Σ ( < ) (+1) �D , ⊳, P σ � σ ∈ Σ D — Finite ⊳ + — Linear order on D Successor wrt ⊳ + ⊳ — P σ — Partition D Slide 5 w ∈ Σ ∗ �D w , ( ⊳ ) w , ( ⊳ + ) w , P w ≡ σ � σ ∈ Σ def D w = { i | 0 ≤ i < | w |} def ( ⊳ ) w = {� i, i + 1 � | 0 ≤ i < | w | − 1 } def ( ⊳ + ) w = {� i, j � | 0 ≤ i < j < | w |} def P w = { i | w = u · σ · v, | u | = i } σ A · B def = �D A ⊎ D B , ( ⊳ ) A·B , ( ⊳ + ) A ∪ ( ⊳ + ) B ∪ ( D A × D B ) , P A σ ⊎ P B σ � k -grams k -factors { w } , if | w | < k F k ( w ) def = { y | w = x · y · z, | y | = k } , otherwise . Slide 6 F k ( L ) def = { F k ( w ) | w ∈ L } Strictly k -Local Definitions G ⊆ F k ( { ⋊ } · Σ ∗ · { ⋉ } ) = G def w | ⇐ ⇒ F k ( ⋊ · w · ⋉ ) ⊆ G L ( G ) def = { w | w | = G}
UCLA MathLing 4 Scanners a b a b a b a b a a b a b a b a b a · · · · · · b k k Slide 7 D Q G : · · · ∈ φ · · · a a · · · b b · · · a · · · b k Strictly Local Generation biscuit likes biscuit slept biscuit ⋉ ⋊ the dog likes dog slept likes the dog ⋉ Slide 8 ⋊ Bob the biscuit Bob likes Bob slept likes Bob Bob ⋉ ⋊ Alice the dog Alice likes Alice slept likes Alice Alice ⋉ slept ⋉ ⋉ Alice likes the dog The dog slept ⋊ Alice likes the dog ⋉ ⋊ The dog slept ⋊ Alice likes the dog ⋉ ⋊ the dog slept ⋉
UCLA MathLing 5 Character of Strictly 2-Local Sets Theorem (Suffix Substitution Closure): A stringset L is strictly 2-local iff whenever there is a word x and strings w , y , v , and z , such that w · x · y ∈ L v · x · z ∈ L Slide 9 then it will also be the case that w · x · z ∈ L Example: The dog · likes · the biscuit ∈ L Alice · likes · Bob ∈ L The dog · likes · Bob ∈ L Character of (General) Strictly Local Sets Theorem (General Suffix Substitution Closure): a stringset l is Strictly Local iff there is some k such that whenever there is a string x of length k − 1 and strings w , y , v , and z , such that Slide 10 · · ∈ L w x y · · ∈ L v x z then it will also be the case that w · x · z ∈ L
UCLA MathLing 6 k -Expressions def f ∈ F k ( ⋊ · Σ ∗ ⋉ ) w | = f ⇐ ⇒ f ∈ F k ( ⋊ · w · ⋉ ) def ϕ ∧ ψ w | = ϕ ∧ ψ ⇐ ⇒ w | = ϕ and w | = ψ def ¬ ϕ w | = ¬ ϕ ⇐ ⇒ w �| = ϕ Slide 11 Locally k -Testable Languages (LT k ): L ( ϕ ) def = { w | w | = ϕ } � SL k ≡ [ ¬ f i ] � LT k f i �∈G LT Automata a b a b a b a b a a b a b a b a b a a b b a Slide 12 b a a Boolean a b Network b a b b a b φ
UCLA MathLing 7 Character of Locally Testable Sets Locally Testable Sets A stringset L over Σ is Locally Testable iff (by definition) there is some k -expression ϕ over Σ (for some k ) such that L is the set of all strings that satisfy ϕ . L ϕ = { x ∈ Σ ∗ | x | Slide 13 = ϕ } Theorem ( k -Test Invariance): A stringset L is Locally Testable iff there is some k such that, for all strings x and y , if ⋊ · x · ⋉ and ⋊ · y · ⋉ have exactly the same set of k -factors then either both x and y are members of L or neither is. FO( < ) (Strings) �D , ⊳, ⊳ + , P σ � σ ∈ Σ First-order Quantification over positions in the strings def w, [ x �→ i, y �→ j ] | = x ⊳ y ⇐ ⇒ j = i + 1 x ⊳ y def x ⊳ + y = x ⊳ + y w, [ x �→ i, y �→ j ] | ⇐ ⇒ i < j Slide 14 def w, [ x �→ i ] | ⇐ ⇒ i ∈ P σ P σ ( x ) = P σ ( x ) . . ϕ ∧ ψ . . . ¬ ϕ . def ( ∃ x )[ ϕ ( x )] w, s | = ( ∃ x )[ ϕ ( x )] ⇐ ⇒ w, s [ x �→ i ] | = ϕ ( x )] for some i ∈ D
UCLA MathLing 8 Locally Testable with Order (LTO k ) LT k plus = ϕ • ψ def ϕ • ψ w | ⇐ ⇒ w = w 1 · w 2 , w 1 | = ϕ and w 2 | = ψ. Definition 1 (Star-Free Set) The class of Star-Free Sets (SF) is the smallest class of languages satisfying: Slide 15 • ∅ ∈ SF, { ε } ∈ SF, and { σ } ∈ SF for each σ ∈ Σ . • If L 1 , L 2 ∈ SF then: L 1 · L 2 ∈ SF , L 1 ∪ L 2 ∈ SF , L 1 ∈ SF . Theorem 1 (McNauthton and Papert) A set of strings is k -Locally Testable with Order (LTO k ) iff it is Star-Free . FO( < ) over Strings and LTO w | ⇔ w | = ( ∃ x, y )[ x ⊳ y ∧ P a ( x ) ∧ P b ( y )] = ab = ( ∃ x )[ ϕ <x ( x ) ∧ ψ ≥ x ( x )] w | = ϕ • ψ ⇔ w | w | ⇔ w | = P σ ( max ) = σ ⋉ Slide 16 w | = max ≈ max ⇔ w | = f ∨ ¬ f = � w | = max ≈ min ⇔ w | σ ∈ Σ [ ⋊ σ ⋉ ] = ( ∃ x )[ � ( x ) ∧ ψ ≥ x � ϕ i ,ψ i �∈ S ϕ [ ϕ <x w | = ( ∃ x )[ ϕ ( x )] ⇔ w | ( x )] ] i i S ϕ finite, qr( ϕ i ) , qr( ψ i ) < qr(( ∃ x )[ ϕ ( x )]) . Theorem 2 (McNauthton and Papert) A set of strings is First-order definable over �D , ⊳, ⊳ + , P σ � σ ∈ Σ iff it is Star-Free .
UCLA MathLing 9 Character of First-Order Definable Sets Theorem (McNaughton and Papert): A stringset L is Star-Free iff it is recognized by a finite-state automaton that is non-counting (that has an aperiodic syntactic monoid), that is, iff: there exists some n > 0 such that Slide 17 for all strings u, v, w over Σ if uv n w occurs in L then uv n + i w , for all i ≥ 1, occurs in L as well. E.g. ( n = 2) � �� � � �� � my father’s father’s father resembled my father ∈ L ≥ 1 � �� � � �� � � �� � ∈ L my father’s father’s (father’s) father resembled my father FO( +1 ) (Strings) �D , ⊳, P σ � σ ∈ Σ First-order Quantification (over positions in the strings) Theorem 3 (Thomas) A set of strings is First-order definable over �D , ⊳ + , P σ � σ ∈ Σ iff it is Locally Threshold Testable . Slide 18 Definition 2 (Locally Threshold Testable) A set L is Locally Threshold Testable (LTT) iff there is some k and t such that, for all w, v ∈ Σ ∗ : if for all f ∈ F k ( ⋊ · w · ⋉ ) ∪ F k ( ⋊ · v · ⋉ ) either | w | f = | v | f or both | w | f ≥ t and | v | f ≥ t , then w ∈ L ⇐ ⇒ v ∈ L .
UCLA MathLing 10 MSO (Strings) �D , ⊳, ⊳ + , P σ � σ ∈ Σ Slide 19 First-order Quantification (positions) Monadic Second-order Quantification (sets of positions) ⊳ + is MSO-definable from ⊳ . MSO Example ( ∃ X 0 , X 1 )[ ( ∀ x, y )[( X 0 ( x ) ∧ x ⊳ y ) → X 0 ( y )] ∧ ( ∀ x )[ C ( x ) → X 0 ( x )] ∧ ( ∃ x )[ X 0 ( x ) ∧ B ( x )] ∧ ( ∀ x, y )[( X 1 ( x ) ∧ x ⊳ y ) → X 1 ( y )] ∧ Slide 20 ( ∀ x )[ B ( x ) → X 0 ( x )] ∧ ¬ ( ∃ x )[ A ( x ) ∧ X 1 ] ] a c a c b b b X 1 X 1 X 1 X 1 X 0 X 0 X 0 X 0 X 0 X 0
UCLA MathLing 11 Automata for MSO − ∅ 0 a c a c b b b − { X 0 } 1 X 1 X 1 X 1 X 1 − { X 1 } 2 X 0 X 0 X 0 X 0 X 0 X 0 − { X 0 , X 1 } 3 Slide 21 a a a a a a b 0 0 0 1 1 2 f 3 f 0 2 ⋊ ⋉ ⋉ b b b b b b b 2 0 2 1 3 2 2 3 3 0 3 ⋊ ⋉ ⋉ c c c c c b c 1 0 1 1 1 2 3 3 3 1 3 ⋊ ⋉ ⋉ c b c c a a b b 1 0 1 1 3 3 3 ⋊ 0 1 1 3 3 3 ⋊ ⋉ Theorem 4 (Chomsky Sh¨ utzenberger) A set of strings is Regular iff it is a homomorphic image of a Strictly 2-Local set. Definition (Nerode Equivalence): Two strings w and v are Nerode Equivalent with respect to a stringset L over Σ (denoted w ≡ L v ) iff for all strings u over Σ , wu ∈ L ⇔ vu ∈ L . Theorem 5 (Myhill-Nerode) : A stringset L is recognizable by a FSA (over strings) iff ≡ L partitions the set of all strings over Σ Slide 22 into finitely many equivalence classes. Theorem 6 (B¨ uchi, Elgot) A set of strings is MSO-definable over �D , ⊳, ⊳ + , P σ � σ ∈ Σ iff it is regular. Theorem 7 MSO = ∃ MSO over strings. SL � FO(+) = LTT � FO( < ) = SF � MSO = Reg. (strings)
Recommend
More recommend