Preliminaries Algebraic characterizations Results in this paper An Algebraic Characterization of the Strictly Piecewise Languages Jie Fu 1 , Jeffrey Heinz 2 , and Herbert G. Tanner 1 1 Department of Mechanical Engineering 2 Department of Linguistics and Cognitive Science University of Delaware May 24, 2011 TAMC 2011 University of Electro-Communications Chofu, Japan 1 / 35
Preliminaries Algebraic characterizations Results in this paper This talk 1. The Strictly Piecewise (SP) languages are those formal languages which are closed under subsequence. 2. They are a proper subclass of the regular languages; i.e. they are subregular . 3. This talk provides an algebraic characterization of this class: they are exactly those regular languages which are wholly nonzero and right annhilating . *This research is supported by grant #1035577 from the National Science Foundation. 2 / 35
Preliminaries Algebraic characterizations Results in this paper Outline Preliminaries Algebraic characterizations Results in this paper 3 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 4 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT substrings, successor SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 4 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT substrings, subsequences, successor precedence SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 4 / 35
Preliminaries Algebraic characterizations Results in this paper Why subregular languages? 1. They provide an interesting measure of pattern complexity. 2. For particular domains, subregular language classes better characterize the patterns we are interested in. • Phonology ! • Robotics ! We wish to obtain a better understanding of these classes. While much work characterizes subregular classes algebraically (Eilenberg, Pin, Straubing, . . . ), none has addressed the SP class. 5 / 35
Preliminaries Algebraic characterizations Results in this paper Measure of language complexity Sequences of As and Bs which Sequences of As and Bs with end in B an odd number of Bs ( A ∗ BA ∗ BA ∗ ) ∗ A ∗ BA ∗ �∈ star-free ( A + B ) ∗ B ∈ SL Minimal deterministic Minimal deterministic finite-state automata finite-state automata A B A A B B 0 1 0 1 A B Conclusion: The size of the DFA as given by the Nerode equivalence relation doesn’t capture these distinctions. 6 / 35
Preliminaries Algebraic characterizations Results in this paper Samala Chumash Phonotactics Knowledge of word well-formedness possible Chumash words impossible Chumash words stoyonowonowa S StoyonowonowaS stoyonowonowas S toyonowonowas pisotonosikiwat pisotono S ikiwat 1. What formal language describes this pattern? 2. By the way, S toyonowonowa S means ‘it stood upright’ (Applegate 1972) 7 / 35
Preliminaries Algebraic characterizations Results in this paper Samala Chumash Phonotactics Knowledge of word well-formedness possible Chumash words impossible Chumash words stoyonowonowa S StoyonowonowaS stoyonowonowas S toyonowonowas pisotonosikiwat pisotono S ikiwat 1. What formal language describes this pattern? 2. By the way, S toyonowonowa S means ‘it stood upright’ (Applegate 1972) 7 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 8 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 8 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 8 / 35
Preliminaries Algebraic characterizations Results in this paper Subregular Hierarchies Regular Star-Free=NonCounting Proper inclusion relationships among subregular language TSL LTT classes (indicated from top to bottom). LT PT SL SP TSL Tier-based Strictly Local PT Piecewise Testable LTT Locally Threshold Testable SL Strictly Local LT Locally Testable SP Strictly Piecewise (McNaughton and Papert 1971, Simon 1975, Rogers and Pullum in press, Rogers et al. 2010, Heinz 2010, Heinz et al. 2011) 8 / 35
Preliminaries Algebraic characterizations Results in this paper Subsequences and Shuffle Ideals Definition (Subsequence) u is a subsequence of w iff u = a 0 a 1 · · · a n and w ∈ Σ ∗ a 0 Σ ∗ a 1 Σ ∗ · · · Σ ∗ a n Σ ∗ We write u ⊑ s w . Definition (Strictly Piecewise languages, SP) The Strictly Piecewise languages are those closed under subsequence. I.e. L ∈ SP if and only if for all w ∈ Σ ∗ , w ∈ L ⇔ ( ∀ u ⊑ s w ) [ u ∈ L ] . 9 / 35
Preliminaries Algebraic characterizations Results in this paper Shuffle Ideals Definition (Shuffle Ideal) The shuffle ideal of u is SI ( u ) = { w : u ⊑ s w } . Example SI ( aa ) = Σ ∗ a Σ ∗ a Σ ∗ . Note SI ( u ) is the set of all words not containing the subsequence u . 10 / 35
Preliminaries Algebraic characterizations Results in this paper Theorem (Rogers et al. 2010) L ∈ SP iff there exists a finite set S ⊂ Σ ∗ such that � L = SI ( w ) . w ∈ S In other words, every Strictly Piecewise language has a finite basis S , the set of forbidden subsequences . (see also Haines 1969, Higman 1952) 11 / 35
Preliminaries Algebraic characterizations Results in this paper Samala Chumash pattern is SP � L = SI ( w ) w ∈ S S = { s S , S s } possible Chumash words impossible Chumash words StoyonowonowaS stoyonowonowa S stoyonowonowas S toyonowonowas pisotonosikiwat pisotono S ikiwat 12 / 35
Preliminaries Algebraic characterizations Results in this paper Strictly Local Definition (Factor) u is a factor of w ( u ⊑ f w ) iff ∃ x, y ∈ Σ ∗ such that w = xuy . Example bc ⊑ f abcd . Definition (Strictly Local, SL) A language is Strictly Local ( ∗ ) iff there is a finite set of forbidden factors S ∈ Σ ∗ such that � Σ ∗ w Σ ∗ . L = w ∈ S Example L = Σ ∗ aa Σ ∗ belongs to SL. ( ∗ ) Technically, special symbols are used to demarcate the beginning and ends of words. They are ignored here for exposition. 13 / 35
Preliminaries Algebraic characterizations Results in this paper Piecewise and Locally Testable Subsequences Factors P ≤ k ( w ) = F k ( w ) = { u : u ⊑ s w and | u | ≤ k } { u : u ⊑ f w and | u | = k } Example Example P ≤ 2 ( abcd ) = F 2 ( abcd ) = { ab, bc, cd } . { λ, a, b, c, d, ab, ac, ad, bc, bd, cd } . Definition: A language L is Definition: A language L is Locally Testable iff there Piecewise Testable iff there exists some k ∈ N such that exists some k ∈ N such that for all u, v ∈ Σ ∗ : for all u, v ∈ Σ ∗ : � � � � P ≤ k ( u ) = P ≤ k ( v ) F k ( u ) = F k ( v ) ⇓ ⇓ � � � � u ∈ L ⇔ v ∈ L u ∈ L ⇔ v ∈ L 14 / 35
Recommend
More recommend