on learning sparse boolean formulae for
play

On Learning Sparse Boolean Formulae For Explaining AI Decisions - PowerPoint PPT Presentation

On Learning Sparse Boolean Formulae For Explaining AI Decisions SUSMIT JHA, SRI INTERNATIONAL VASUMATHI RAMAN, ZOOX INC. ALESSANDRO PINTO, TUHIN SAHAI, AND MICHAEL FRANCIS UNITED TECHNOLOGIES RESEARCH CENTER, BERKELEY 1 5/20/2017 Ubiquitous


  1. On Learning Sparse Boolean Formulae For Explaining AI Decisions SUSMIT JHA, SRI INTERNATIONAL VASUMATHI RAMAN, ZOOX INC. ALESSANDRO PINTO, TUHIN SAHAI, AND MICHAEL FRANCIS UNITED TECHNOLOGIES RESEARCH CENTER, BERKELEY 1 5/20/2017

  2. Ubiquitous AI and need for explaination • This route is faster. • There is traffic on Bay Bridge. • There is an accident just after Bay Bridge backing up traffic. Why did we take the San Mateo bridge instead of the Bay Bridge ? 2 5/20/2017

  3. Decision/Recommendation/Classification Incorrect ML Recommendations Medical Diagnosis Autonomous Systems Certification 3 5/20/2017

  4. Decision/Recommendation/Classification Incorrect ML Recommendations Medical Diagnosis Autonomous Systems Certification Scalable but less interpretable : Interpretable but less scalable: Neural Networks, Support Vector Decision Trees, Propositional Rules 4 5/20/2017

  5. Even `algorithmic’ decision making: A* Path Planning A* is an algorithm that: Uses heuristic to guide search While ensuring that it will compute a path with “estimated cost” minimum cost A* computes the function f(n) = g(n) + h(n) “actual cost” f(n) = g(n) + h(n) g(n) = “cost from the starting node to reach n” h(n) = “estimate of the cost of the cheapest path from n to the goal node ” 5

  6. Example: A* Path Planner 6

  7. Example: A* Path Planner Why didn’t we go through X / Z ? 7

  8. Example: A* Path Planner 1. Internal details of algorithm and its implementation is unknown to human observer/user. So, explanation must be in terms of a common vocabulary . 2. Some explanations need further deduction and inference that were not performed by the AI algorithm while making the original decision. 3. Decision making is often accomplished through complex composition of a number of AI algorithms and hence, an explanation process would be practical only if it did not require detailed modeling of each AI algorithm . 8

  9. Local Explanations of Complex Models 9 5/20/2017

  10. Local Explanations of Complex Models Sufficient Cause 10 5/20/2017

  11. Local Explanations of Complex Models Simplified Sufficient Cause 11 5/20/2017

  12. Local Explanations in AI Simplified Sufficient Cause Measure of how well g approximates f Measure of complexity of g Formulation in AI: • Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?: Explaining the Predictions of Any Classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016. • Hayes, Bradley, and Julie A. Shah. "Improving Robot Controller Transparency Through Autonomous Policy Explanation." Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction . ACM, 2017. 12 5/20/2017

  13. Model Agnostic Explanation through Boolean Learning Maps in which optimum path goes via green Maps in which optimum path does not go via green Find a Boolean formula 𝜚 such that Let each point in k-dimensions 𝜚 ⇔ 𝑄𝑏𝑢ℎ 𝑑𝑝𝑜𝑢𝑏𝑗𝑜 𝑨 (for some k) correspond to a 𝜚 ⇒ 𝑄𝑏𝑢ℎ 𝑑𝑝𝑜𝑢𝑏𝑗𝑜 𝑨 map. Why does the path not go through Green? 13 5/20/2017

  14. Example: Explanations in A* A* 𝜚 𝑟𝑣𝑓𝑠𝑧 : 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 : Some property of the output Using explanation vocabulary Ex: Some cells not selected Ex: Obstacle presence 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 ⇒ 𝝔 𝒓𝒗𝒇𝒔𝒛 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 ⇔ 𝝔 𝒓𝒗𝒇𝒔𝒛

  15. Example: Explanations in A* A* 𝜚 𝑟𝑣𝑓𝑠𝑧 : 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 : Some property of the output Using explanation vocabulary Ex: Some cells not selected Ex: Obstacle presence 𝑴𝒇𝒃𝒔𝒐 𝑬𝒇𝒅𝒋𝒕𝒋𝒑𝒐 𝑼𝒔𝒇𝒇𝒕 For 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 𝑽𝒕𝒋𝒐𝒉 𝑴𝒃𝒄𝒇𝒎𝒕 𝒈𝒑𝒔 𝝔 𝒓𝒗𝒇𝒔𝒛

  16. Example: Explanations in A* 50x50 grid has 2 2 50𝑌50 possible explanations even if vocabulary only considers presence/absence of obstacles. A* Scalability: Usually the feature space or vocabulary is large. For a map, its order of features in the map. For an image, it is order of the image’s resolution. Guarantee: Is the sampled space of maps enough to generate the explanation with some quantifiable probabilistic guarantee? 𝑴𝒇𝒃𝒔𝒐 𝑬𝒇𝒅𝒋𝒕𝒋𝒑𝒐 𝑼𝒔𝒇𝒇𝒕 For 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 𝑽𝒕𝒋𝒐𝒉 𝑴𝒃𝒄𝒇𝒎𝒕 𝒈𝒑𝒔 𝝔 𝒓𝒗𝒇𝒔𝒛

  17. Example: Explanations in A* Definition Theoretical Result: Learning Boolean formula even approximately is hard. 3-DNF is not learnable in Probably Approximately Correct framework unless RP = NP.

  18. Two Key Ideas 1. Vocabulary is large. 2. How many samples (and what distribution) to consider for learning explanation ? 3. Learning Boolean formula with PAC guarantees is hard. Active learning Boolean formula 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables !

  19. Two Key Ideas Active learning Boolean formula 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables !

  20. Two Key Ideas Involves only two variables. If we knew which two, we had only 2 2 2 = 16 possible explanations. How do we find these relevant variables? Active learning Boolean formula 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables !

  21. Actively Learning Boolean Formula A* 𝜚 𝑟𝑣𝑓𝑠𝑧 : Some property of the output Assignments to V m1 = (0,0,0,1,1,0,1) Ex: Some cells not selected m2 = (0,0,1,1,0,1,0) 𝜚 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 (V) : Evaluates assignments and returns T,F Using explanation vocabulary Ex: Obstacle presence

  22. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Assignments to V m1 = (0,0,0,1,1,0,1) m1 : True

  23. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m1: True, m2: False Random Sample Till Oracle differs

  24. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m1: True, m2: False

  25. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m1: True, m2: False m3: True

  26. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Hamming Assignments to V Distance = 4 m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) Hamming Distance = 2 m1: True, m2: False m3: True

  27. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Hamming Assignments to V Distance = 2 m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m4 = (0,0,1,1,1,1,0) Hamming Distance = 1 m2: False, m3: True m4: True

  28. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Hamming Assignments to V Distance = 2 m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m4 = (0,0,1,1,1,1,0) Hamming Distance = 1 m2: False, m3: True m4: True

  29. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Hamming Assignments to V Distance = 1 m2 = (0,0,1,1,0,1,0) m4 = (0,0,1,1,1,1,0) Fifth variable 𝒘 𝟔 is relevant !! m2: False, m4: True

  30. Actively Learning Relevant Variables 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚 𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| For each assignment to relevant variables Binary Search Over Random Sample Till Hamming Distance 2 |𝑉| Oracle differs 𝑚𝑜(1/(1 − 𝜆)) 𝑚𝑜(|𝑊|) Relevant variables of 𝝔 𝒇𝒚𝒒𝒎𝒃𝒋𝒐 found with confidence 𝝀 in 𝑾 𝟑 𝐕 𝒎𝒐( 𝟐 − 𝝀 )

Recommend


More recommend