automata learning
play

Automata Learning Borja Balle Amazon Research Cambridge 1 - PowerPoint PPT Presentation

Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) July 2018 1 Based on work completed before joining Amazon Brief History of Automata Learning 1967 Gold: Regular languages are


  1. Automata Learning Borja Balle Amazon Research Cambridge 1 Foundations of Programming Summer School (Oxford) — July 2018 1 Based on work completed before joining Amazon

  2. Brief History of Automata Learning 1967 Gold: Regular languages are learnable in the limit 1987 Angluin: Regular languages are learnable from queries 1993 Pitt & Warmuth: PAC-learning DFA is NP-hard 1994 Kearns & Valiant: Cryptographic hardness . . . Clark, Denis, de la Higuera, Oncina, others: Combinatorial methods meet statistics and linear algebra 2009 Hsu-Kakade-Zhang & Bailly-Denis-Ralaivola: Spectral learning

  3. Goals of This Tutorial Goals § Motivate spectral learning techniques for weighted automata and related models on sequential and tree-structured data § Provide the key intuitions and fundamental results to effectively navigate the literature § Survey some formal learning results and give overview of some applications § Discuss role of linear algebra, concentration bounds, and learning theory in this area Non-Goals § Dive deep into applications: instead pointers will be provided § Provide an exhaustive treatment of automata learning: beyond the scope of an introductory lecture § Give complete proofs of the presented results: illuminating proofs will be discussed, technical proofs omitted

  4. Outline 1. Sequential Data and Weighted Automata 2. WFA Reconstruction and Approximation 3. PAC Learning for Stochastic WFA 4. Statistical Learning for WFA 5. Beyond Sequences: Transductions and Trees 6. Conclusion

  5. Outline 1. Sequential Data and Weighted Automata 2. WFA Reconstruction and Approximation 3. PAC Learning for Stochastic WFA 4. Statistical Learning for WFA 5. Beyond Sequences: Transductions and Trees 6. Conclusion

  6. Learning Sequential Data § Sequential data arises in numerous applications of Machine Learning: § Natural language processing § Computational biology § Time series analysis § Sequential decision-making § Robotics § Learning from sequential data requires specialized algorithms § The most common ML algorithms assume the data can be represented as vectors of a fixed dimension § Sequences can have arbitrary length, and are compositional in nature § Similar things occur with trees, graphs, and other forms of structured data § Sequential data can be diverse in nature § Continuous vs. discrete time vs. only order information § Continuous vs. discrete observations

  7. Functions on Strings § In this lecture we focus on sequences represented by strings on a finite alphabet: Σ ‹ § The goal will be to learn a function f : Σ ‹ Ñ R from data § The function being learned can represent many things, for example: § A language model: f p sentence q “ likelihood of observing a sentence in a specific natural language § A protein scoring model: f p aminoacid sequence q “ predicted activity of a protein in a biological reaction § A reward model: f p action sequence q “ expected reward an agent will obtain after executing a sequence of actions § A network model: f p packet sequence q “ probability that a sequence of packets will successfully transmit a message through a network § These functions can be identified with a weighted language f P R Σ ‹ , an infinite-dimensional object § In order to learn such functions we need a finite representation: weighted automata

  8. Weighted Finite Automata Graphical Representation Algebraic Representation „ ´ 1 „ 1 . 2   a , 1 . 2 a , 3 . 2 α “ β “ 0 . 5 0 b , 2 b , 5 a , ´ 2 „ 1 . 2 b , 0  ´ 1 A a “ q 1 q 2 ´ 2 3 . 2 ´ 1 0 . 5 1 . 2 0 „ 2 ´ 2  a , ´ 1 A b “ 0 5 b , ´ 2 Weighted Finite Automaton A WFA A with n “ | A | states is a tuple A “ x α , β , t A σ u σ P Σ y where α , β P R n and A σ P R n ˆ n

  9. Language of a WFA With every WFA A “ x α , β , t A σ uy with n states we associate a weighted language f A : Σ ‹ Ñ R given by ˜ T ¸ ÿ ź f A p x 1 ¨ ¨ ¨ x T q “ α p q 0 q A x t p q t ´ 1 , q t q β p q T q t “ 1 q 0 , q 1 ,..., q T Pr n s “ α J A x 1 ¨ ¨ ¨ A x T β “ α J A x β Recognizable/Rational Languages A weighted language f : Σ ‹ Ñ R is recognizable/rational if there exists a WFA A such that f “ f A . The smallest number of states of such a WFA is rank p f q . A WFA A is minimal if | A | “ rank p f A q . Observation: The minimal A is not unique. Take any invertible matrix Q P R n ˆ n , then α J A x 1 ¨ ¨ ¨ A x T β “ p α J Q qp Q ´ 1 A x 1 Q q ¨ ¨ ¨ p Q ´ 1 A x T Q qp Q ´ 1 β q

  10. Examples: DFA, HMM Deterministic Finite Automata Hidden Markov Model § Weights in t 0 , 1 u § Weights in r 0 , 1 s § Initial: α indicator for initial state § Initial: α distribution over initial state § Final: β indicates accept/reject state § Final: β vector of ones σ § Transition: A σ p i , j q “ I r i § Transition: Ñ j s σ σ § f A : Σ ‹ Ñ t 0 , 1 u defines regular A σ p i , j q “ P r i Ñ j s “ P r i Ñ j s P r i Ñs § f A : Σ ‹ Ñ r 0 , 1 s defines dynamical language system

  11. Hankel Matrices Given a weighted language f : Σ ‹ Ñ R define its Hankel matrix H f P R Σ ‹ ˆ Σ ‹ as ǫ a b ¨¨¨ s ¨¨¨ . » fi . f p ǫ q f p a q f p b q . ǫ . — . ffi f p a q f p aa q f p ab q . — ffi a — ffi . — . ffi f p b q f p ba q f p bb q . b — ffi H f “ — ffi . . — ffi . — ffi — ffi ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ f p p ¨ s q p — ffi – fl . . . Fliess–Kronecker Theorem [Fli74] The rank of H f is finite if and only if f is rational, in which case rank p H f q “ rank p f q

  12. Intuition for the Fliess–Kronecker Theorem H f A P R Σ ‹ ˆ Σ ‹ P A P R Σ ‹ ˆ n S A P R n ˆ Σ ‹ s . » fi . . » ¨ ¨ ¨ fi s . — . ffi » fi . ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi . — ffi ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ “ — . ffi — ffi . – fl — ffi — ffi ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi p ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ ¨ ¨ ¨ ¨ – fl p — ffi ¨ ¨ ¨ – fl . . . α J A p 1 ¨ ¨ ¨ A p T f A p p 1 ¨ ¨ ¨ p T ¨ s 1 ¨ ¨ ¨ s T 1 q “ A s 1 ¨ ¨ ¨ A s T 1 β looooooomooooooon loooooomoooooon α A p p q β A p s q Note: We call H f “ P A S A the forward-backward factorization induced by A

  13. Outline 1. Sequential Data and Weighted Automata 2. WFA Reconstruction and Approximation 3. PAC Learning for Stochastic WFA 4. Statistical Learning for WFA 5. Beyond Sequences: Transductions and Trees 6. Conclusion

  14. From Hankel to WFA f p p 1 ¨ ¨ ¨ p T s 1 ¨ ¨ ¨ s T 1 q “ α J A p 1 ¨ ¨ ¨ A p T A s 1 ¨ ¨ ¨ A s T 1 β s » ¨ fi » ¨ ¨ ¨ fi » fi ¨ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi — ffi H “ ¨ “ ¨ ¨ ¨ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl — ffi — ffi ¨ ¨ f p ps q ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – fl – fl ¨ ¨ ¨ ¨ f p p 1 ¨ ¨ ¨ p T σ s 1 ¨ ¨ ¨ s T 1 q “ α J A p 1 ¨ ¨ ¨ A p T A a A s 1 ¨ ¨ ¨ A s T 1 β s » ¨ fi » ¨ ¨ ¨ fi » fi » fi ¨ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi — ffi — ffi — ffi H σ “ ¨ “ ¨ ¨ ¨ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ — ffi — ffi – fl – fl — ffi — ffi ¨ ¨ f p pas q ¨ ¨ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ‚ ¨ ¨ p – fl – fl ¨ ¨ ¨ ¨ Algebraically: Factorizing H lets us solve for A a A σ “ P ` H σ S ` H “ P S ù ñ H σ “ P A σ S ù ñ

  15. Aside: Moore–Penrose Pseudo-inverse For any M P R n ˆ m there exists a unique pseudo-inverse M ` P R m ˆ n satisfying: § MM ` M “ M , M ` MM ` “ M ` , and M ` M and MM ` are symmetric § If rank p M q “ n then MM ` “ I , and if rank p M q “ m then M ` M “ I § If M is square and invertible then M ` “ M ´ 1 Given a system of linear equations Mu “ v , the following is satisfied: M ` v “ } u } 2 . argmin u P argmin } Mu ´ v } 2 In particular: § If the system is completely determined, M ` v solves the system § If the system is underdetermined, M ` v is the solution with smallest norm § If the system is overdetermined, M ` v is the minimum norm solution to the least-squares problem min } Mu ´ v } 2

  16. Finite Hankel Sub-Blocks Given finite sets of prefixes and suffixes P , S Ă Σ ‹ and infinite Hankel matrix H f P R Σ ‹ ˆ Σ ‹ we define the sub-block H P R P ˆ S and for σ P Σ the sub-block H σ P R P σ ˆ S ǫ a b aa ab ba bb ¨¨¨ » ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ fi ǫ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ a — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ b — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ aa — ffi H f “ — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ ab — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ ba — ffi — ffi ‚ ‚ ‚ ‚ ‚ ‚ ‚ ¨ ¨ ¨ bb — ffi – fl . . . . . . . . ... . . . . . . . . . . . . . . . .

Recommend


More recommend