ACL2010—Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College jrogers@cs.earlham.edu http://cs.earlham.edu/~jrogers/slides/acl2010talk.ho.pdf Regular Models of Long-Distance Dependencies “. . . we wish to escape the linear tyranny of these n -gram models and HMM tagging models, and to start to explore more complex notions of grammar.” —Manning and Sch¨ utze, 1999 Samala (Chumash): Slide 2 ts ]) do not occur after [+anterior] (e.g., [ S ], [ > [-anterior] (e.g., [s], [ > tS ]) [ S tojonowonowa S ] ‘it stood upright’ *[ S tojonowonowa s ] Σ ∗ · ([ S ] + [ > tS ]) · Σ ∗ · ([s] + [ > ts ]) · Σ ∗
ACL2010—Heinz and Rogers 2 n -gram Models of Language 0.4 a a ♯ b 0.1 0.3 a c a 0.2 0.3 0.0 b ♯ b F ♯ 0.2 b 0.4 a 0.2 c 0.5 c 0.4 b 0.5 0.0 0.0 ♯ ♯ c Slide 3 0.0 c 0.5 � Pr L ( σ 1 · · · σ n ) = Pr L ( σ 1 | ♯ ) · [Pr L ( σ i | σ i − 1)] · Pr L ( ♯ | σ n ) 1 <i ≤ n F k ( w ) def = { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } k ( w ) def { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } F M = { } � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ F M k ( ♯ · w · ♯ ) Strictly k -Local Languages (SL k ) a a b ♯ a c a b ♯ b F ♯ b a c c b ♯ ♯ Slide 4 c c T M def = { vσ ∈ F k ( ♯ · Σ ∗ · ♯ ) | δ ( v, σ ) ↓} L ( M ) = { w ∈ Σ ∗ | F k ( w ) ⊆ T M } L ∈ SL k def ⇐ ⇒ L is L ( M ) for some k -scanner M L ∈ SL def ⇐ ⇒ ( ∃ k )[ L ∈ SL k ]
ACL2010—Heinz and Rogers 3 Subsequences v is a subsequence of w : v ⊑ w def ⇒ v = σ 1 · · · σ k and w ∈ Σ ∗ · σ 1 · Σ ∗ · · · Σ ∗ · σ k · Σ ∗ ⇐ P k ( w ) def P ≤ k ( w ) def = { v ∈ Σ k | v ⊑ w } � = [ P i ( w )] Slide 5 0 <i ≤ k k ( w ) def P M = { { v ⊑ w } } Would like: � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ P M ≤ k ( w ) Initial Model 0.1 0.5 a a { ε, b } { ε, a, b } c 0.4 0.2 b b c 0.0 0.1 0.2 0.3 b 0.0 0.2 0.2 b 0.3 0.2 0.3 0.0 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a 0.3 0.3 0.2 0.4 c c Slide 6 0.3 c 0.2 0.4 0.2 c 0.5 0.0 0.2 b c 0.4 a 0.5 b { ε, c } a { ε, a, c } 0.5 c 0.1 0.3 0.0 Q = P ( P ≤ k (Σ ∗ )) Let w = v · σ · u , q = ˆ δ ( { ε } , v ): T ( q, σ ) = Pr L ( σ | P ≤ k ( v ) = q )
ACL2010—Heinz and Rogers 4 PT-Automata a { ε, b } a { ε, a, b } c b b c b b Slide 7 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a c c c c b c a b a { ε, c } { ε, a, c } c Piecewise-Testable Languages (PT) SI( w ) def = { v ∈ Σ ∗ | w ⊑ v } L is Piecewise Testable def ⇐ ⇒ L is a finite Boolean combination of principal shuffle ideals. Slide 8 P k -expressions Atoms v ∈ P ≤ k (Σ ∗ ) = v def w | ⇐ ⇒ w ∈ SI( v ) (i.e., v ⊑ w ) Operators Truth functional connectives L ∈ PT k ⇔ L = { w ∈ Σ ∗ | w | = ϕ } for some P k -expression ϕ
ACL2010—Heinz and Rogers 5 PT-Automata and P k -expressions a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a Slide 9 c c c c b c a b a { ε, c } { ε, a, c } c � � F ϕ = { q ∈ P ( P ≤ k (Σ ∗ )) | ( [ s ] ∧ [ ¬ s ]) → ϕ } s ∈ q s �∈ q L ( M ϕ ) = { w ∈ Σ ∗ | w | = φ } Subregular Hierarchies Reg MSO SF FO LTT Slide 10 LT PT Prop SL SP Fin +1 <
ACL2010—Heinz and Rogers 6 Strictly Piecewise Testable Languages (SP) The following are equivalent: 1. L ∈ SP 2. L is the set of strings satisfying a finite conjunction of negative P k -literals. Slide 11 3. L = � w ∈ S [SI( w )] , S finite, 4. ( ∃ k )[ P ≤ k ( w ) ⊆ P ≤ k ( L ) ⇒ w ∈ L ], 5. w ∈ L and v ⊑ w ⇒ v ∈ L ( L is subsequence closed ), 6. L = SI( X ) , X ⊆ Σ ∗ ( L is the complement of a shuffle ideal). DFA representation of SP k languages Let M be a trimmed minimal DFA recognizing an SP k language. Then: 1. All states of M are accepting states. Slide 12 2. If δ ( q, σ ) ↑ then there is some s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q } ) such that for all q ′ ∈ Q s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q ′ } ) ⇒ δ ( q, σ ) ↑ Consequently, for all q 1 , q 2 ∈ Q and σ ∈ Σ, if δ ( q 1 , σ ) ↑ and δ ( q 1 , w ) = q 2 for some w ∈ Σ ∗ then δ ( q 2 , σ ) ↑ . ˆ (Missing edges propagate down.)
ACL2010—Heinz and Rogers 7 SP k -automata a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b Slide 13 a c c c c b c a b a { ε, c } { ε, a, c } c Q = P ( P ≤ k − 1 (Σ ∗ )) Size of automaton: Θ(2 card (Σ) k ) Factored SP k -automata b b SI( aa ) ε a a a c c Slide 14 a b SI( bc ) ε a b c a c
ACL2010—Heinz and Rogers 8 SP-PDFA a b ε b a b ε a a a a b ε b b Slide 15 b b a b ε a aa a a b a a b ε a ab a b b b a ε b ba b b a b a a b ε b bb b b Product PDFAs Co-emission Probability CT( � σ, q 1 . . . q n � ) = Π n i =1 T i ( q i , σ ) CF( � q 1 . . . q n � ) = Π n i =1 F i ( q i ) Slide 16 � Z ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) + CT( � σ, q 1 . . . q n � ) σ ∈ Σ F ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) Z ( � q 1 . . . q n � ) T ( � q 1 . . . q n � , σ ) = CT( � σ, q 1 . . . q n � ) Z ( � q 1 . . . q n � )
ACL2010—Heinz and Rogers 9 Product PDFAs— k -sets Positive Co-emission Probability � PCT( � σ, q ǫ . . . q u � ) = T w ( q w , σ ) q w ∈� q ǫ ...q u � q w = w � PCF( � q ǫ . . . q u � ) = F w ( q w ) q w ∈� q ǫ ...q u � Slide 17 q w = w � Z ( � q 1 . . . q n � ) = PCF( � q 1 . . . q n � ) + PCT( � σ, q 1 . . . q n � ) σ ∈ Σ Let q = � ǫ, ǫ, b, aa, a, ba, b � : CT( a, q ) = T ǫ ( ǫ, a ) · T a ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ab ( a, a ) · T ba ( ba, a ) · T bb ( b, a ) PCT( a, q ) = T ǫ ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ba ( ba, a ) Complexity Number of automata: � [ card (Σ) i ] = Θ( card (Σ) k − 1 ) 0 ≤ i<k Number of states: � [( i + 1) card (Σ) i ] = Θ( k card (Σ) k − 1 ) Slide 18 0 ≤ i<k ML estimation n = � w ∈ S [ | w | ]—size of corpus Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Pr L ( w ) Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Parameters Only final states matter card (Σ)Θ( card (Σ) k − 1 ) = Θ( card (Σ) k ) ( Same )
ACL2010—Heinz and Rogers 10 Remaining issues • Estimation undercounts – counts number of k -sequences that start with first prefix—Θ( n ) � n � ∈ Θ(2 n ). – actual number k • Want probability to depend on multiset of subsequences Slide 19 – infinitely many states – but probability of n occurrences is (probability of occurrence) n – same number of parameters/still linear time • Not Regular distribution – Not clear that there is a corresponding class of distributions over strings Summary SP-Distributions • Regular distribution Model (some) long distance dependencies • Asymptotic complexity same as SL-distributions ( n -gram Slide 20 models) • SL-distributions can’t model long distance dependencies SP-distributions can’t model local ones • Both are classes of Regular distributions Combination is straightforward
ACL2010—Heinz and Rogers 11 Results of SP 2 estimation on the Samala corpus x Pr ( x | P ≤ 1 ( y )) > > s ts S tS Slide 21 s 0.0325 0.0051 0.0013 0.0002 ⁀ ts 0.0212 0.0114 0.0008 0. y 0.0011 0. 0.067 0.0359 S > tS 0.0006 0. 0.0458 0.0314
Recommend
More recommend