Safe Machine Learning Silvia Chiappa & Jan Leike ICML 2019 ML - PowerPoint PPT Presentation

Safe Machine Learning Silvia Chiappa & Jan Leike · ICML 2019

ML Research Reality horns offline datasets nose annotated a long time ago tail simulated environments … abstract domains also more cute restart experiments at will ... Image credit: Keenan Crane & Nepluno CC BY-SA

Deploying ML in the real world has real-world consequences @janleike

Why safety? faults misuse fairness fake news short-term biased datasets deep fakes safe exploration spamming adversarial robustness privacy interpretability ... ... alignment automated hacking long-term shutdown problems terrorism reward hacking totalitarianism ... ... @janleike

Why safety? faults misuse fake news biased datasets short-term deep fakes … spamming privacy ... safe exploration adversarial robustness fairness, alignment adversarial testing interpretability automated hacking long-term terrorism totalitarianism shutdown problems ... reward hacking ... @janleike

The space of safety problems Ortega et al. (2018) Specification Robustness Assurance Behave according to intentions Withstand perturbations Analyze & monitor activity @janleike

Safety in a nutshell @janleike

Safety in a nutshell Where does this come from? (Specification) @janleike

Safety in a nutshell Where does this come from? (Specification) What about rare cases/adversaries? (Robustness) @janleike

Safety in a nutshell Where does this How good is our come from? approximation? (Specification) (Assurance) What about rare cases/adversaries? (Robustness) @janleike

Outline Intro Specification for RL Assurance – break – Specification: Fairness @janleike

Specification Does the system behave as intended? @janleike

Degenerate solutions and misspecifications The surprising creativity of digital evolution (Lehman et al., 2017) https://youtu.be/TaXUZfwACVE @janleike

Degenerate solutions and misspecifications The surprising creativity of digital Faulty reward functions in the wild evolution (Lehman et al., 2017) (Amodei & Clark, 2016) https://youtu.be/TaXUZfwACVE https://openai.com/blog/faulty-rewar d-functions/ More examples: tinyurl.com/specification-gaming (H/T Victoria Krakovna) @janleike

What if we train agents with a human in the loop? @janleike

Algorithms for training agents from human data myopic nonmyopic demos IRL behavioral cloning GAIL feedback TAMER RL from COACH modeled rewards @janleike

Potential performance Imitation TAMER/COACH RL from modeled rewards performance human @janleike

Specifying behavior move 37 circling boat AlphaGo Lee Sedol @janleike

Reward modeling @janleike

Learning rewards from preferences: the Bradley-Terry model Akrour et al. (MLKDD 2011), Christiano et al. (NeurIPS 2018) @janleike

Reward modeling on Atari Reaching superhuman performance Outperforming “vanilla” RL best human score Christiano et al. (NeurIPS 2018) @janleike

Imitation learning + reward modeling imitation policy RL reward model demos preferences Ibarz et al. (NeurIPS 2018) @janleike

Scaling up What about domains too complex for human feedback? Safety via debate Iterated amplification Recursive reward modeling Irving et al. (2018) Christiano et al. (2018) Leike et al. (2018) @janleike

Reward model exploitation Ibarz et al. (NeurIPS 2018) 1. Freeze successfully trained reward model 2. Train new agent on it 3. Agent finds loophole Solution : train the reward model online , together with the agent @janleike

A selection of other specification work @janleike

Avoiding unsafe states by blocking actions 4.5h of human oversight 0 unsafe actions in Space Invaders Saunders et al. (AAMAS 2018) @janleike

Shutdown problems > 0 ⇒ agent wants to prolong the episode (disable the off-switch) < 0 ⇒ agent wants to shorten the episode (press the off-switch) Safe interruptibility The off-switch game Q-learning is safely interruptible, but not SARSA Solution: retain uncertainty over the reward Solution: treat interruptions as off-policy data function ⇒ agent doesn’t know the sign of the return Orseau and Armstrong (UAI, 2016) Hadfield-Menell et al. (IJCAI 2017) @janleike

Understanding agent incentives Causal influence diagrams Impact measures Estimate difference, e.g. # steps between states ● ● # of reachable states difference in value ● Everitt et al. (2019) Krakovna et al. (2018) @janleike

Assurance Analyzing, monitoring, and controlling systems during operation. @janleike

White-box analysis Saliency maps Finding the channel that most supports a decision Maximizing activation of neurons/layers Olah et al. (Distill, 2017, 2018) @janleike

Black-box analysis: finding rare failures ● Approximate “ AVF ” f: initial MDP state ⟼ P[failure] Train on a family of related ● agents of varying robustness ● ⇒ Bootstrapping by learning the structure of difficult inputs on weaker agents Result: failures found ~1,000x faster Uesato et al. (2018) @janleike

Verification of neural networks Reluplex Interval bound propagation ฀ -local robustness at point x 0 : ● Rewrite this as SAT formula with linear terms ● Use an SMT-solver to solve the formula ● Reluplex : special algorithm for ImageNet downscaled to 64x64: branching with ReLUs ● Verified adversarial robustness of 6-layer MLP with ~13k parameters Katz et al. (CAV 2017) Ehlers (ATVA 2017), Gowal et al. (2018) @janleike

Questions?

— 10 min break —

Part II Specification: Fairness Silvia Chiappa · ICML 2019

ML systems used in areas that severely affect people lives Financial lending ○ Hiring ○ Online advertising ○ Criminal risk assessment ○ Child welfare ○ Health care ○ Surveillance ○

Two examples of problematic systems 1. Criminal Risk Assessment Tools Defendants are assigned scores that predict the risk of re-committing crimes. These scores inform decisions about bail, sentencing, and parole. Current systems have been accused of being biased against black people. 2. Face Recognition Systems Considered for surveillance and self-driving cars. Current systems have been reported to perform poorly, especially on minorities.

From public optimism to concern The Economist Attitudes to police technology are changing—not only among American civilians but among the cops themselves. Until recently Americans seemed willing to let police deploy new technologies in the name of public safety. But technological scepticism is growing. On May 14th San Francisco became the first American city to ban its agencies from using facial recognition systems.

One fairness definition or one framework? “Nobody has found a definition which is 21 Fairness Definitions and Their widely agreed as a good definition of Politics. Arvind Narayanan. fairness in the same way we have for, say, ACM Conference on Fairness, the security of a random number Accountability, and Transparency Tutorial (2018) generator.” “There are a number of definitions and S. Mitchell, E. Potash, and S. Barocas (2018) research groups are not on the same P. Gajane and M. Pechenizkiy (2018) S. Verma and J. Rubin (2018) page when it comes to the definition of fairness.” Differences/connections between “The search for one true definition fairness definitions are difficult to is not a fruitful direction, as grasp. technical considerations cannot We lack common language/framework. adjudicate moral debates.”

Common group-fairness definitions (binary classification setting) Dataset Demographic Parity sensitive attribute ● class label ● The percentage of individuals prediction of the class ● assigned to class 1 should be the features ● same for groups A=0 and A=1. Males Females

Common group-fairness definitions Equal False Positive/Negative Rates Predictive Parity (EFPRs/EFNRs)

The Law Regulated Domains Lending, Education, Hiring, Housing (extends to target advertising). Protected (Sensitive) Groups Reflect the fact that in the past there have been unjust practices.

Safe Machine Learning Silvia Chiappa & Jan Leike ICML 2019 ML - PowerPoint PPT Presentation

Safe Machine Learning Silvia Chiappa & Jan Leike ICML 2019 ML Research Reality horns offline datasets nose annotated a long time ago tail simulated environments abstract domains also more cute restart experiments at will ...

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

WebAppSec WG Update TPAC 2015 Brad Hill Scope Expansion 2013 had 3 Rec-track documents under

Who Am I? Peter Silberman Researcher/Developer at MANDIANT on the product team

The Time-less Datacenter Paul Borrill and Alan H. Karp Earth Computing The Datacenter Resilience

Analyzing & Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub

Stellar Consensus Protocol Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Run-DMA Michael Rushanan, Stephen Checkoway Johns Hopkins University, University of Illinois at

Recent Advances in Generalized Matching Theory John William Hatfield Stanford Graduate School of

Lock-Free Algorithms For Ultimate Performance Martin Thompson - @mjpt777 Modern Hardware

Safe Machine Learning Silvia Chiappa & Jan Leike ICML 2019 ML - PowerPoint PPT Presentation

Safe Machine Learning Silvia Chiappa & Jan Leike ICML 2019 ML Research Reality horns offline datasets nose annotated a long time ago tail simulated environments abstract domains also more cute restart experiments at will ...

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

WebAppSec WG Update TPAC 2015 Brad Hill Scope Expansion 2013 had 3 Rec-track documents under

Who Am I? Peter Silberman Researcher/Developer at MANDIANT on the product team

The Time-less Datacenter Paul Borrill and Alan H. Karp Earth Computing The Datacenter Resilience

Analyzing &amp; Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub

Stellar Consensus Protocol Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Run-DMA Michael Rushanan, Stephen Checkoway Johns Hopkins University, University of Illinois at

Recent Advances in Generalized Matching Theory John William Hatfield Stanford Graduate School of

Lock-Free Algorithms For Ultimate Performance Martin Thompson - @mjpt777 Modern Hardware

Analyzing & Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub