Data Valuation using Reinforcement Learning Jinsung Yoon, Sercan O. - PowerPoint PPT Presentation

Data Valuation using Reinforcement Learning Jinsung Yoon, Sercan O. Arik, Tomas Pfister Google Cloud AI 2020 International Conference on Machine Learning (ICML 2020) 1

Problem Defjnition What is data valuation? ● ○ How much does each data contribute to the trained model 2 Amirata Ghorbani, James Y. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning , ICML , 2019

Objective & Use-cases - Learn in reliable way Data valuation ● Fair valuation for the labelers and data provider ○ Insights about the dataset ○ 3 Ruoxi Jia et al., Towards Efficient Data Valuation Based on the Shapley Value , AISTATS , 2019

Objective & Use-cases - Learn in reliable way High-value samples Corrupted sample discovery ● Low-value samples 4

Objective & Use-cases - Learn in reliable way Robust learning with noisy (or cheaply-acquired) datasets ● Augmented learning ○ Cheaply-acquired samples High valued samples 5 Amirata Ghorbani, James Y. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning , ICML , 2019

Objective & Use-cases - Learn in reliable way Domain adaptation ● Assigns higher values on the samples from the target distribution ○ Training Set Target Set Type Type A B Type D Type Type C D High valued samples 6

Related works - Leave-one-out Not reasonable when there are two similar training samples. ● 7 Amirata Ghorbani, James Y. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning , ICML , 2019

Related works - Data Shapley Computational complexity is exponential with the number of samples. ● 8 Amirata Ghorbani, James Y. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning , ICML , 2019

Challenges & Motivation ● The search space is extremely large. ○ Impossible to explore the entire space. ● Training processes can be non-differentiable ○ Selection operation (i.e. sampler block) is non-differentiable. ○ Performance metrics can be non-differentiable (accuracy, AUC). End-to-end back-propagation may not be possible. ○ Reinforcement learning is an efficient way to explore large search ● space and to handle non-differentiable process. 9

High-level fjgure for DVRL ● Jointly train selector and predictor in an end-to-end way. 10

Problem formulation To minimize the validation loss Components ● Weighted optimization for ○ Training set: predictor ○ Validation set: ○ Predictor model: ○ Data valuation model: 11

Block diagram 12

Experiments - How to quantitatively evaluate the data valuation? Remove high / low valued samples ● Corrupted sample discovery ● Robust learning with noisy data ● Domain adaptation ● 13

Results - Remove high / low valued samples Standard supervised learning setting (train, validation, test datasets ● come from the same distribution) Remove high valued samples: Fastest performance degradation ● Remove low valued samples: Slowest performance degradation ● 14

Results - Corrupted sample discovery Corrupted sample setting (20% of label noise) ● ● Highest True Positive Rate (TPR) for corrupted sample discovery 15

Results - Robust learning with noisy labels (40%) Proves scalability of DVRL in terms of complex models ● (WideResNet-28-10 and ResNet-32) and large datasets (CIFAR) State-of-the-art robust learning performance ● Mengye Ren et al., Learning to Reweight Examples for Robust Deep Learning , ICML , 2018 16

Results - Domain adaptation on Retail dataset Training Set Train-on-Specific Training Set Testing Set Type D Type Type Train-on-All A B Type D Train-on-Rest Type Type C D Training Set Type A Type Type B C 17

Results - Domain adaptation on Retail dataset Significant gain on Train on Rest setting ( largest domain mismatch ) ● ● Reasonable gain on Train on All setting ( most common setting ) ● Marginal gain on Train on Specific setting ( no domain mismatch ) 18

Results - Domain adaptation in other domains Main source of gain: ● ○ DVRL jointly optimizes the data valuator and corresponding predictor model 19 Amirata Ghorbani, James Y. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning , ICML , 2019

Discussion: How many validation samples are needed? ● A small number of validation samples are enough for DVRL training. Reasonable performances even with 10 validation samples on Adult data. ● 20

Codebase of DVRL DVRL - Github: https://github.com/google-research/google-research/tree/master/dvrl DVRL- AI-Hub: https://aihub.cloud.google.com/u/0/p/products%2Fcb6b588c-1582-4868-a944-dc70ebe61a36 21

Data Valuation using Reinforcement Learning Jinsung Yoon, Sercan O. - PowerPoint PPT Presentation

Data Valuation using Reinforcement Learning Jinsung Yoon, Sercan O. Arik, Tomas Pfister Google Cloud AI 2020 International Conference on Machine Learning (ICML 2020) 1 Problem Defjnition What is data valuation? How much does each

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Valuation Valuation Wetland Wetland Valuation of Environment and Resource Value is defined as

VALUATION CA Bhavik Shah 16 May 2015 Presentation Overview Valuation Concept Purpose of

Valuation Application CA Pinkesh Billimoria 8 th June 2019 Topics covered: Valuation for Mergers

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

UK Covered Bond Forum 23 rd June 2011 Agenda Opening remarks from Chair A. Simons, FSA

Flavor Violating Higgs Decays Joachim Kopp Galileo Galilei Institute November 26, 2012 Based on

Constraints on Higgs FCNC Couplings from Precision Measurement of B s + Decay

JavaScript Writ Large Douglas Crockford Yahoo! Inc. The World's Most Popular Programming

Green Book Training: Internal Control for Auditors 11 1 Green Book and Yellow Book

If you had a good enough p a r e n t , you are most f o r t un a t e b e c a u se you had a real

What Is Assurance? John Rushby Based on joint work with Bev Littlewood (City University UK)

Status of JUNO Jiajie Ling Sun Yat-Sen University for the JUNO Collaboration Module of