Robust Learning from Untrusted Sources Nikola Konstantinov - PowerPoint PPT Presentation

Robust Learning from Untrusted Sources Nikola Konstantinov Christoph H. Lampert ICML, June 2019 Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 1 / 13

Motivation Collecting data for machine learning applications Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 2 / 13

Motivation Using multiple data sources Crowdsourcing Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 3 / 13

Motivation Using multiple data sources Web crawling Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 3 / 13

Motivation Using multiple data sources Data from personal devices Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 3 / 13

Motivation Using multiple data sources Data from different labs Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 3 / 13

Motivation Using multiple data sources Data from different labs How can we learn robustly from such data? Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 3 / 13

Motivation Learning from untrusted sources Motivation Untrusted sources can provide valuable data for training. Some of these data batches might be corrupted or irrelevant. Goal Naive approaches are to: Simply train on all data. Train only on the trusted subset. Can we do better? Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 4 / 13

Theory Setup Learning task Unknown target distribution D T on X × Y . Loss function L : Y × Y → R + . Want to learn a predictor h : X → Y from a hypothesis class H . Given Have a small reference dataset: � x T 1 , y T � � x T m T , y T � S T = { , . . . , } ∼ D T 1 m T Also given m i data points from each source i = 1 , . . . , N : � x i 1 , y i � � x i m i , y i � S i = { , . . . , } ∼ D i 1 m i Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 5 / 13

Theory Approach Assign weights α = ( α 1 , ..., α N ) to the sources, � N i =1 α i = 1. Minimize the α -weighted empirical loss:   N m i 1 ˆ � � x i , y i � � � � h α = argmin ǫ α ( h ) = argmin ˆ L h α i  j j  m i h ∈H h ∈H i =1 j =1 Want a small expected loss on the target distribution: � � � � ˆ L (ˆ h α = E D T h α ( x ) , y ) ǫ T How to decide which sources are trustworthy ? Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 6 / 13

Theory Approach Discrepancies between the sources (Kifer et al., VLDB 2004; Mohri et al., ALT 2012): disc H ( D i , D T ) = sup | ǫ i ( h ) − ǫ T ( h ) | h ∈H Small if H does not distinguish between the two learning tasks. Popular in the domain adaptation literature. Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 7 / 13

Theory Bound on the expected loss Given a hypothesis set H , let: ˆ h α = argmin h ∈H ˆ ǫ α ( h ) h ∗ T = argmin h ∈H ǫ T ( h ) For any δ > 0, with probability at least 1 − δ : | ǫ T (ˆ h α ) − ǫ T ( h ∗ T ) | ≤ � N N N � α 2 � � � � i 2 α i disc H ( D i , D T ) + C ( δ ) + 4 α i R i ( H , L ) � m i i =1 i =1 i =1 Similar bounds in Ben-David et al., ML 2010; Zhang et al., NIPS 2013. Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 8 / 13

Theory Algorithm Theory suggests: Select α by minimizing: � N N � α 2 � � � i α i disc H ( D i , D T ) + λ � m i i =1 i =1 Find ˆ h α by minimizing the α -weighted empirical risk. Choose λ by cross-validation on the reference dataset. Trade-off between exploiting trusted sources and using all data. In practice, work with the empirical discrepancies: m i m T | 1 1 � � x i , y i x T , y T � � � � � � � � disc H ( S i , S T ) = sup L h − L h | j j j j m i m T h ∈H j =1 j =1 Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 9 / 13

Theory Experiments Evaluate empirically on: Multitask Dataset of Product Reviews 1 . Animals with Attributes 2 2 . Some clean reference data for a target task is available. Have other subsets, some of which are corrupted. Experimented with various manipulations/problems with the data. 1 Pentina et al., ICML 2017; McAuley et al., 2015 2 Xian et al., TPAMI 2018 Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 10 / 13

Theory Results Ours Average classification error Reference only 0.40 All data Pregibon et al. Median of probs 0.35 Feng et al. Yin et al. Batch norm 0.30 0.25 0.20 0 10 20 30 40 50 60 Number of corrupted sources Figure: Animals with Attributes 2: RGB channels swapped Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 11 / 13

Theory Summary Data from different sources is naturally heterogeneous. Our method suppresses the effect of corrupted/irrelevant data. The approach is theoretically justified and shows good empirical performance. The algorithm can be applied even when the data is private and/or distributed. Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 12 / 13

Theory Summary Data from different sources is naturally heterogeneous. Our method suppresses the effect of corrupted/irrelevant data. The approach is theoretically justified and shows good empirical performance. The algorithm can be applied even when the data is private and/or distributed. Thank you for your attention! Poster 156 Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 12 / 13

Robust Learning from Untrusted Sources Nikola Konstantinov - PowerPoint PPT Presentation

Robust Learning from Untrusted Sources Nikola Konstantinov Christoph H. Lampert ICML, June 2019 Konstantinov, Lampert; IST Austria Robust Learning from Untrusted Sources Poster 156 1 / 13 Motivation Collecting data for machine learning

Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Sources Sources: Kinds of Sources Citizen witness Confidential informants Anonymous

Sources of Start Sources of Start- -up Capital up Capital up Capital Sources of Start Sources

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Select the best sources by Currency Select the checking best sources by Range Select the

Upgrading Transport Protocols using Untrusted Mobile Code Parveen Patel Andrew Whitaker Jay

, , Weakly Supervised Classification Robust Learning and More: Robust Learning and More:

Sources of Authority Sources of Authority Sources of Authority Lesson No. 3 ENV H 471

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

Action Robust Reinforcement Learning and Applications in Continuous Control Chen Tessler *,

Presentation on Electron Sources Chapter 5 Presented By, Ved Prakash Verma (Thermionic

Data Sources; SCNL Data Sources Data sources producing waveform data can come from a remote

Side-channel analysis of six SHA-3 candidates in HMAC scheme Olivier Beno t and Thomas

Connecting EnVision Centers and Housing Counseling Agencies September 24, 2020 OFFICE OF

Mutual information deep regularization for semi- supervised segmentation J. Peng, M. Pedersoli,

Motivational Interviewing Part 1: June 5, 2017 Pam Pietruszewski, MA Integrated Health

A meta-learning system for multi-instance classification Gitte Vanwinckelen and Hendrik Blockeel

Computational Tools Data Simple Calculator Spreadsheet Processing Complex Hybrid Scripting

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex

Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Localizing Learning after