mlic a maxsat based framework for learning interpretable
play

MLIC: A MaxSAT-Based framework for learning interpretable - PowerPoint PPT Presentation

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1 Kuldeep S. Meel 2 1 IBM Research, USA 2 School of Computing, National University of Singapore CP 2018 1 / 24 The Rise of Artificial Intelligence


  1. MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1 Kuldeep S. Meel 2 1 IBM Research, USA 2 School of Computing, National University of Singapore CP 2018 1 / 24

  2. The Rise of Artificial Intelligence • “In Phoenix, cars are self-navigating the streets. In many homes, people are barking commands at tiny machines, with the machines responding. On our smartphones, apps can now recognize faces in photos and translate from one language to another.” (New York Times, 2018) • “AI is the new electricity” (Andrew Ng, 2017) 2 / 24

  3. The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) 3 / 24

  4. The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) • The practitioners adopt techniques that can be interpreted and validated by them 3 / 24

  5. The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) • The practitioners adopt techniques that can be interpreted and validated by them • Medical and education domains see usage of techniques such as classification rules, decision rules, and decision lists. 3 / 24

  6. Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER 4 / 24

  7. Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER • The problem of learning optimal interpretable models is computationally intractable 4 / 24

  8. Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER • The problem of learning optimal interpretable models is computationally intractable • Prior work, which was mostly rooted in late 1980s and 1990s, focused on greedy approaches 4 / 24

  9. Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. 5 / 24

  10. Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. • The problem of rule learning is inherently an Approach optimization problem • The past few years have seen SAT revolution and development of tools that employ SAT as core engine 5 / 24

  11. Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. • The problem of rule learning is inherently an Approach optimization problem • The past few years have seen SAT revolution and development of tools that employ SAT as core engine • Can we take advantage of SAT revolution , in particular progress on MaxSAT solvers? 5 / 24

  12. Key Contributions • A MaxSAT-based framework, MLIC, that provably trades off accuracy vs interpretability of rules • The prototype implementation is capable of finding optimal (or high quality near-optimal) classification rules from large data sets 6 / 24

  13. Part I From Rule Learning to MaxSAT 7 / 24

  14. Binary Classification • Features: x = { x 1 , x 2 , · · · x m } • Input: Set of training samples { X i , y i } – each vector X i ∈ X contains valuation of the features for sample i , – y i ∈ { 0 , 1 } is the binary label for sample i • Output: Classifier R , i.e. y = R ( x ) • Our focus: classifiers that can be represented as CNF Formulas R := C 1 ∧ C 2 ∧ · · · ∧ C k . • Size of classifiers: |R| = Σ i | C i | 8 / 24

  15. Constraint Learning vs Machine Learning Input Set of training samples { X i , y i } Output Classifier R • Constraint Learning: min R |R| such that R ( X i ) = y i , ∀ i 9 / 24

  16. Constraint Learning vs Machine Learning Input Set of training samples { X i , y i } Output Classifier R • Constraint Learning: min R |R| such that R ( X i ) = y i , ∀ i • Machine Learning: min R |R| + λ |E R | such that R ( X i ) = y i , ∀ i / ∈ E R 9 / 24

  17. MLIC Step 1 Discretization of Features Step 2 Transformation to MaxSAT Query Step 3 Invoke a MaxSAT Solver and extract R from MaxSAT solution 10 / 24

  18. Encoding to MaxSAT Input Features: x = { x 1 , x 2 , · · · x m } ; Training Data: { X i , y i } over m featues Output R of k clauses Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode sample i should be considered as noise or not. 11 / 24

  19. Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 12 / 24

  20. Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule 12 / 24

  21. Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 12 / 24

  22. Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 4 N i := ( η i ); W ( N i ) = λ We want as few η i to be true as possible 12 / 24

  23. Encoding to MaxSAT 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 4 N i := ( η i ); W ( N i ) = λ We want as few η i to be true as possible Construction Let Q k = � i , j V j i D i ∧ � i N i ∧ � i σ ∗ = MaxSAT( Q k , W ) , then x j ∈ R i iff σ ∗ ( b j i ) = 1 . i x 1 ∨ b 2 i x 2 . . . ∨ b m Remember, R i = ( b 1 i x m ) 13 / 24

  24. Provable Guarantees Theorem ( Provable trade off of accuracy vs interpretability of rules) Let R 1 ← MLIC ( X , y , k , λ 1 ) and R 2 ← MLIC ( X , y , k , λ 2 ) , if λ 2 > λ 1 then |R 1 | ≤ |R 2 | and |E R 1 | ≥ |E R 2 | . 14 / 24

  25. Learning DNF Rules • ( y = S ( x )) ↔ ¬ ( y = ¬ S ( x )). • And if S is a DNF formula, then ¬ S is a CNF formula. • To learn rule S , we simply call MLIC with ¬ y as input and negate the learned rule. 15 / 24

  26. Part II Experimental Results 16 / 24

  27. Illustrative Example • Iris Classification: • Features: sepal length, sepal width, petal length, and petal width • MLIC learned R := (sepal length > 6.3 ∨ sepal width > 3.0 ∨ petal width < = 1.5 ) ∧ 1 ( sepal width < = 2.7 ∨ petal length > 4.0 ∨ petal width > 1.2 ) ∧ 2 ( petal length < = 5.0) 3 17 / 24

Recommend


More recommend