Learning Moore Machines from Input-Output Traces Georgios Giantamidis - PowerPoint PPT Presentation

Learning Moore Machines from Input-Output Traces Georgios Giantamidis 1 and Stavros Tripakis 1 , 2 1 Aalto University, Finland 2 UC Berkeley, USA

Motivation: learning models from black boxes Inputs → → ? → Formal Learner Model → → Outputs Many applications: Verify that a black-box component is safe to use Dynamic malware analysis ... Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 2 / 32

Learning FSMs from input-output traces IO-traces Learned FSM q 1 aa �→ 020 1 b a, b → → baa �→ 0122 Learner q 0 q 3 bba �→ 0122 0 2 a b abaa �→ 02220 a, b abba �→ 02220 a q 2 2 Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 3 / 32

Outline Background 1 Formal problem definition 2 Related work 3 Identification in the limit 4 Our learning algorithms 5 Results 6 Summary & future work 7 Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 4 / 32

Moore machines q 1 input alphabet, I = { a, b } 1 a, b b output alphabet, O = { 0 , 1 , 2 } q 0 q 3 set of states, Q = { q 0 , q 1 , q 2 , q 3 } 0 2 a b initial state, q 0 a, b a q 2 transition function, δ : Q × I → Q 2 output function, λ : Q → O ( I, O, Q, q 0 , δ, λ ) Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 5 / 32

Moore machines q 1 input alphabet, I = { a, b } 1 a, b b output alphabet, O = { 0 , 1 , 2 } q 0 q 3 set of states, Q = { q 0 , q 1 , q 2 , q 3 } 0 2 a b initial state, q 0 a, b a q 2 transition function, δ : Q × I → Q 2 output function, λ : Q → O ( I, O, Q, q 0 , δ, λ ) By definition, our machines are deterministic and complete . Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 5 / 32

Input-output traces q 1 1 b a, b aa �→ 020 baa �→ 0122 q 0 q 3 bba �→ 0122 0 2 a b abaa �→ 02220 abba �→ 02220 a, b a q 2 2 Moore machine Some I/O traces generated by the machine Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 6 / 32

Consistency q 1 1 b a, b aa �→ 020 baa �→ 0122 q 0 q 3 bba �→ 0122 0 2 a b abaa �→ 02220 abba �→ 02220 a, b a q 2 2 This machine is consistent with this set of traces. Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 7 / 32

Consistency r 1 1 b aa �→ 020 baa �→ 0122 a, b r 0 r 3 bba �→ 0122 0 2 a b abaa �→ 02220 abba �→ 02220 a, b a r 2 2 This machine is inconsistent with this set of traces. Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 8 / 32

A first attempt at problem definition Given ... Input alphabet, I Output alphabet, O Set of IO-traces, S (the training set ) ... find a Moore machine M such that: M is deterministic M is complete M is consistent with S Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 10 / 32

A trivial solution q aa 0 a q a a, b 2 b a q ǫ q ab b �→ 01 0 b 2 aa �→ 020 q b ab �→ 022 a, b 1 a, b Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 11 / 32

A trivial solution q aa 0 a q a a, b 2 b a q ǫ q ab b �→ 01 0 b 2 aa �→ 020 q b ab �→ 022 a, b 1 a, b This is called the prefix-tree machine . Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 11 / 32

A trivial solution q aa 0 a q a a, b 2 b a q ǫ q ab b �→ 01 0 b 2 aa �→ 020 q b ab �→ 022 a, b 1 a, b This is called the prefix-tree machine . Not quite a solution: machine incomplete ... Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 11 / 32

A trivial solution q aa 0 a q a a, b 2 a b b �→ 01 q ǫ q ab aa �→ 020 0 b 2 q b ab �→ 022 a, b 1 a, b ... but easily completed with self-loops. Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 12 / 32

Problems with the trivial solution (1) Poor generalization , due to trivial completion with self-loops The machine may be consistent with the training set ... ... but how accurate is it on a test set? Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 13 / 32

Problems with the trivial solution (1) Poor generalization , due to trivial completion with self-loops The machine may be consistent with the training set ... ... but how accurate is it on a test set? (2) Large number of states in the learned machine The prefix-tree machine does not merge states at all. Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 13 / 32

Revised problem definition The LMoMIO problem (Learning Moore Machines Input-Output Traces) : Given ... Input alphabet, I Output alphabet, O Set of IO-traces, S (the training set) ... find a Moore machine M such that: M is deterministic M is complete M is consistent with S ... and also: M generalizes well (good accuracy on a-priori unknown test sets) M is small (few states) M is found quickly (good learning algorithm complexity) Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 14 / 32

How to measure “accuracy”? We define three metrics: Strong , Medium , Weak test trace machine output strong acc. medium acc. weak acc. abc �→ 1234 1234 1 1 1 abc �→ 1234 4321 0 0 0 1 1 abc �→ 1234 1212 0 2 2 1 abc �→ 1234 3434 0 0 2 1 1 abc �→ 1234 1324 0 4 2 Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 15 / 32

Related work A* [Angluin, 1987] active NP-hard [Gold, 1978] exact p a s s i v e K-tails [Biermann & Feldman, 1972] h e Gold's algorithm [Gold, 1978] u r i s t i c RPNI [Oncina & Garcia, 1992] Genetic algorithms Ant colony optimization Our work Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 17 / 32

Identification in the limit Concept introduced in [Gold, 1967], in the context of formal language learning Learning is seen as an infinite process Training set keeps growing: S 0 ⊆ S 1 ⊆ S 2 ⊆ · · · Every input word is guaranteed to eventually appear in the training set For each S i , the learner outputs machine M i Identification in the limit := learner outputs the right machine after some i Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 19 / 32

Identification in the limit Concept introduced in [Gold, 1967], in the context of formal language learning Learning is seen as an infinite process Training set keeps growing: S 0 ⊆ S 1 ⊆ S 2 ⊆ · · · Every input word is guaranteed to eventually appear in the training set For each S i , the learner outputs machine M i Identification in the limit := learner outputs the right machine after some i A good passive learning algorithm must identify in the limit. Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 19 / 32

Characteristic samples To prove identification in the limit, we use the notion of the Characteristic Sample [C. de la Higuera, 2010]: Concept existing for DFAs (deterministic finite automata) – we adapt it to Moore machines Intuition: set of IO-traces that “covers” the machine (covers all states, all transitions) For a minimal Moore machine M = ( I, O, Q, q 0 , δ, λ ) , there exists a CS of total length O ( | Q | 4 | I | ) Georgios Giantamidis (Aalto University) Learning Moore Machines from Input-Output Traces December 8, 2016 20 / 32

Learning Moore Machines from Input-Output Traces Georgios Giantamidis - PowerPoint PPT Presentation

Learning Moore Machines from Input-Output Traces Georgios Giantamidis 1 and Stavros Tripakis 1 , 2 1 Aalto University, Finland 2 UC Berkeley, USA Motivation: learning models from black boxes Inputs ? Formal Learner Model

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

7. Java Input/Output User Input/Console Output, File Input and Output (I/O) 133 User Input (half

BASIC INPUT/OUTPUT Fundamentals of Computer Science I Outline: Basic Input/Output Screen

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

Learning algorithms using logic (inductive logic programming) input output cat c dog d bear

Traces Exist (Hypothetically)! Carl Pollard Structure and Evidence in Linguistics Workshop in

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Nonlinear Control Lecture # 14 Input-Output Stability Nonlinear Control Lecture # 14 Input-Output

The Stream Hierarchy Inheritance of istream and ostream from ios ios istream ostream Stream

Identifiability of linear compartment models Anne Shiu Texas A&M University ICERM 15

Neural Inference of API Functions from Input Output Examples Rohan Bavishi, Caroline Lemieux,

Java Console Input/Output The Basics Lecture 3 CGS 3416 Fall 2015 September 9, 2015 Console

Domain-Level Debugging for Compiled DSLs with the GEMOC Studio (Tool Demo) Erwan Bousse Tanja

One Data Model SDF: A brief tutorial and status T2TRG summary meeting @ IETF 107+, April 14, 2020

Introduction to Shiny BUILDIN G W EB AP P LICATION S W ITH S H IN Y IN R Ramnath Vaidyanathan

Introduction to Writing Micro-services 2010 iRODS User Group Meeting mwan@diceresearch.org 1

Terminal and Order Reduction of Multi-Input/Output LTI Systems Andr e Schneider Computational

Learning Moore Machines from Input-Output Traces Georgios Giantamidis - PowerPoint PPT Presentation

Learning Moore Machines from Input-Output Traces Georgios Giantamidis 1 and Stavros Tripakis 1 , 2 1 Aalto University, Finland 2 UC Berkeley, USA Motivation: learning models from black boxes Inputs ? Formal Learner Model

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

7. Java Input/Output User Input/Console Output, File Input and Output (I/O) 133 User Input (half

BASIC INPUT/OUTPUT Fundamentals of Computer Science I Outline: Basic Input/Output Screen

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

Learning in One-Layer Networks Psych 209 January 9, 2020 Input-output mapping Simplest model of

Learning algorithms using logic (inductive logic programming) input output cat c dog d bear

Traces Exist (Hypothetically)! Carl Pollard Structure and Evidence in Linguistics Workshop in

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Nonlinear Control Lecture # 14 Input-Output Stability Nonlinear Control Lecture # 14 Input-Output

The Stream Hierarchy Inheritance of istream and ostream from ios ios istream ostream Stream

Identifiability of linear compartment models Anne Shiu Texas A&amp;M University ICERM 15

Neural Inference of API Functions from Input Output Examples Rohan Bavishi, Caroline Lemieux,

Java Console Input/Output The Basics Lecture 3 CGS 3416 Fall 2015 September 9, 2015 Console

Domain-Level Debugging for Compiled DSLs with the GEMOC Studio (Tool Demo) Erwan Bousse Tanja

One Data Model SDF: A brief tutorial and status T2TRG summary meeting @ IETF 107+, April 14, 2020

Introduction to Shiny BUILDIN G W EB AP P LICATION S W ITH S H IN Y IN R Ramnath Vaidyanathan

Introduction to Writing Micro-services 2010 iRODS User Group Meeting mwan@diceresearch.org 1

Terminal and Order Reduction of Multi-Input/Output LTI Systems Andr e Schneider Computational

Identifiability of linear compartment models Anne Shiu Texas A&M University ICERM 15