why is chicago deceptive towards building model driven
play

Why is Chicago deceptive?: Towards Building Model-Driven Tutorials - PowerPoint PPT Presentation

Why is Chicago deceptive?: Towards Building Model-Driven Tutorials for Humans Vivian Lai, Han Liu, and Chenhao Tan @vivwylai | @HanLiuAI | @ChenhaoTan University of Colorado Boulder machineintheloop.com 1 AI used in societally


  1. “Why is ‘Chicago’ deceptive?”: Towards Building Model-Driven Tutorials for Humans Vivian Lai, Han Liu, and Chenhao Tan @vivwylai | @HanLiuAI | @ChenhaoTan University of Colorado Boulder machineintheloop.com 1

  2. AI used in societally critical tasks Recidivism prediction Medical diagnosis Amazon secret AI Autonomous driving hiring tool Geiger et al. 2012; European Parliament 2016; Kleinberg et al. 2017; Dastin 2018 2

  3. 3

  4. Explanations! 4

  5. Explaining AI is tricky 5

  6. � � Why is explaining AI tricky? Two distinct learning modes Discovering Emulating 6

  7. � Why is explaining AI tricky? Two distinct learning modes Emulating 7

  8. � Why is explaining AI tricky? Two distinct learning modes Discovering 8

  9. � Why is explaining AI tricky? Two distinct learning modes Discovering AI can discover inconspicuous and counterintuitive patterns. 9

  10. So, how can explaining AI be less tricky? Model-driven tutorials � Elucidate counterintuitive patterns � Enhance humans' ability to understand patterns 10

  11. Model-driven tutorials: Guidelines State-of-the-art science communication 11

  12. Model-driven tutorials: Examples How do we choose examples? • SP-LIME • Spaced repetition 12 Ribeiro et al. 2016

  13. Model-driven tutorials: Examples How do we choose examples? • SP-LIME • Sp Spaced repetit itio ion 13 Ribeiro et al. 2016

  14. Experimental Design & Research Questions R1: Effect of different tutorials Training � � Prediction 14

  15. Experimental Design & Research Questions RQ1: Effect of different tutorials Different No � � tutorials assistance Training Prediction 15

  16. Experimental Design & Research Questions Training RQ1: Effect of different tutorials 16

  17. Experimental Design & Research Questions RQ2: Effect of real-time assistance Different Same � � real-time tutorial assistance (Spaced repetition) Training Prediction 17

  18. Experimental Design & Research Questions Training RQ1: Effect of different tutorials � � Prediction RQ2: Effect of real-time assistance 18

  19. Experimental Design & Research Questions RQ1 & RQ2 RQ3 Linear model Deep model 19

  20. Experimental Design & Research Questions Training RQ1: Effect of different tutorials � RQ3: Effect � of model complexity Prediction RQ2: Effect of real-time assistance 20

  21. Experimental Design & Research Questions Training RQ1: Effect of different tutorials � RQ3: Effect � of model complexity Prediction RQ2: Effect of real-time assistance Performed qualitative study to improve interface design. 21

  22. Research question 1 Can model-driven tutorials improve human Model- Human driven performance without any accuracy? tutorials real-time assistance in the Training Prediction prediction phase? 22

  23. Tutorials are useful to some extent Control 54.6% p=0.018* # of stars Guidelines 60.4% indicates p-values ***: p < 0.001 Spaced repetition 57.9% **: p < 0.01 *: p < 0.05 p=0.1 Spaced repetition 59.2% + guidelines 50 55 60 65 70 75 80 Accuracy (%) 23

  24. Tutorials are useful to some extent Control 54.6% “ p=0.018* The tutorial is # of stars Guidelines 60.4% indicates p-values helpful but it’s ***: p < 0.001 just hard not Spaced repetition 57.9% **: p < 0.01 being able to *: p < 0.05 p=0.1 ” Spaced repetition reference it . 59.2% + guidelines 50 55 60 65 70 75 80 Accuracy (%) 24

  25. Research question 2 ? If not, how do varying levels of real-time assistance in prediction phase affect human Full human performance after training? Full agency automation 25

  26. Prediction: various levels of real-time assistance Signed explanations + Signed explanations predicted label + guidelines Unsigned explanations + predicted label + accuracy statement Signed explanations Signed explanations + predicted label + guidelines Full Full human automation agency Information from AI increases from left to right. 26

  27. Prediction: various levels of real-time assistance Signed explanations + Signed explanations predicted label + guidelines + predicted label + accuracy statement Unsigned Signed Signed explanations + explanations explanations predicted label + guidelines Full Full human automation agency 27

  28. Unsigned explanations Signed explanations 28

  29. Real-time assistance improves performance No assistance 60.4% Unsigned 57.8% # of stars p=0.001*** indicates p-values Signed 70.7% ***: p < 0.001 p=0.001*** **: p < 0.01 Signed + predicted label 74% *: p < 0.05 + guidelines + accuracy Machine 86 50 60 70 80 90 Accuracy (%) 29

  30. Signed highlights is sufficient No assistance 60.4% Unsigned 57.8% # of stars indicates p-values Signed 70.7% ***: p < 0.001 p>0.05 **: p < 0.01 Signed + predicted label 74% *: p < 0.05 + guidelines + accuracy Machine 86 50 60 70 80 90 Accuracy (%) 30

  31. Gap between human+AI & AI No assistance 60.4% Unsigned 57.8% # of stars indicates p-values Signed 70.7% ***: p < 0.001 **: p < 0.01 Signed + predicted label 74% *: p < 0.05 + guidelines + accuracy Machine 86 50 60 70 80 90 Accuracy (%) Poursabzi-Sangdeh et al. 2018; Green & Chen 2019; Lage et al. 2019; Lai & Tan 2019; Carton et al. 2020; Lai et al. 2020 31

  32. Research question 3 Can our results generalize in other models? How do vs. model complexity and explanation methods affect human performance Simple Deep model model with/without training? 32

  33. SVM explanations BERT attention explanations 33

  34. SVM explanations BERT LIME explanations 34

  35. Simple model = better human performance 72.8% SVM 64.1% p=0.001*** # of stars indicates p-values 58.2% BERT-ATT Training 54.1% ***: p < 0.001 No training **: p < 0.01 p=0.001*** *: p < 0.05 64.9% BERT-LIME 59.2% 50 55 60 65 70 75 80 Accuracy (%) Lai et al. 2019 35

  36. Simple model = better human performance 72.8% SVM 64.1% p=0.001*** 58.2% BERT-ATT Training 54.1% No training p=0.001*** 64.9% BERT-LIME 59.2% 50 55 60 65 70 75 80 Accuracy (%) 36

  37. Training leads to better performance 72.8% SVM 64.1% p=0.001*** # of stars indicates p-values 58.2% BERT-ATT Training 54.1% ***: p < 0.001 No training p=0.001*** **: p < 0.01 *: p < 0.05 64.9% BERT-LIME 59.2% p=0.001*** 50 55 60 65 70 75 80 Accuracy (%) 37

  38. Takeaway � Tutorials somewhat improve Vivian Lai, Han Liu, Chenhao Tan human performance @vivwylai | vivwylai@gmail.com � @HanLiuAI | @ChenhaoTan University of Colorado Boulder Explanations from simple models are preferred Website:machineintheloop.com � Paper:https://tinyurl.com/model- driven-tutorials Future directions for human- Workshop:https://tinyurl.com/harn centered tutorials and ess-explanations explanations 38

Recommend


More recommend