an overview of the ai safety landscape
play

An Overview of the AI Safety Landscape Workshop on Reliable - PowerPoint PPT Presentation

http://ea-foundation.org An Overview of the AI Safety Landscape Workshop on Reliable Artificial Intelligence 2017, ETH Zurich Max Daniel Research Project Manager, Effective Altruism Foundation https://blog.openai.com/faulty-reward-functions/


  1. http://ea-foundation.org An Overview of the AI Safety Landscape Workshop on Reliable Artificial Intelligence 2017, ETH Zurich Max Daniel Research Project Manager, Effective Altruism Foundation

  2. https://blog.openai.com/faulty-reward-functions/ 2

  3. https://blog.openai.com/faulty-reward-functions/ 3

  4. “[C]oncrete safety problems that are ready for experimentation today and relevant to the cutting edge of AI systems” 4. Safe exploration 1. Avoid negative side effects 5. Robustness to 2. Avoid reward hacking distributional shift 3. Scalable oversight Amodei, Olah et al. 2016 4

  5. Ng and Russell (ICML 2000), Hadfield-Menell et al. (NIPS 2016) 5

  6. Christiano et al. 2017 6

  7. Security Huang et al. 2017 7

  8. Source: http://rll.berkeley.edu/adversarial/videos/pong_a3c_trpo_l-inf.mp4

  9. Corrigibility Soares et al. (AAAI 2015), Orseau and Armstrong (UAI 2016) 9

  10. Privacy Papernot et al. (ICLR 2017) 10

  11. “This technical agenda primarily covers topics that the authors believe are tractable, uncrowded, focused, and unable to be outsourced to forerunners of the target AI system.” 1. Realistic World-Models 2. Decision Theory 3. Logical Uncertainty 4. Vingean Reflection Soares and Fallenstein (2017 [2014]) 11

  12. 1) Research Goal 13) Liberty and Privacy 2) Research Funding 14) Shared Benefit 3) Science-Policy Link 15) Shared Prosperity 4) Research Culture 16) Human Control 5) Race Avoidance 17) Non-subversion 6) Safety 18) AI Arms Race 7) Failure Transparency 19) Capability Caution 8) Judicial Transparency 20) Importance 9) Responsibility 21) Risks 10) Value Alignment 22) Recursive Self-Improvement 11) Human Values 23) Common Good 12) Personal Privacy Source: Asilomar AI Principles

  13. Conclusion ● Ensuring that AI agents do what we want is a nontrivial problem. ● Technical AI safety is a thriving field in AI/ML research. ● Several research agendas and concrete problems have been pursued. ● Complements contributions from law, economics, policy, philosophy, social science, … 13

  14. Thank you. Presentation title Subtitle or caption max.daniel@ea-foundation.org John Smith | Head of Department 28.06.2016

Recommend


More recommend