Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1
Bipartite Matching Tasks 1 1 2 2 π = [4, 3, 1, 2] 3 3 4 4 B A Maximum weighted bipartite matching: Machine learning task: Learn the appropriate weights π π (β ) 2
Learning Bipartite Matching | Applications Word alignment (Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008) 1 natΓΌrlich ist das haus klein of course the house is small Correspondence between images (Belongie et. al., 2002; Dellaert et. al., 2003) 2 Learning to rank documents (Dwork et. al., 2001; Le & Smola, 2007) 3 1 2 3 4 3
Desiderata for a Predictor Learning objective: seek pairwise potentials that most compatible with training data Challenge: loss functions (e.g. Hamming loss): non-continuous & non convex Desiderata for predictor: Efficiency 1 runtime: (low degree) polynomial time Consistency 2 must also minimize Hamming loss under ideal condition (given the true distribution and fully expressive model parameters) 4
Exponential Family Random Field Approach (Petterson et. al., 2009; Volkovs & Zemel, 2012) Probabilistic model: #P-hard (Valiant, 1979) Consistent? produce Bayes optimal prediction in an ideal condition Efficient ? normalization term π π involves matrix permanent computation impractical even for modestly-size π = 20 5
Maximum Margin Approach (Tsochantaridis et. al., 2005) Max-margin model: Efficient? polynomial algorithm for computing maximum violated constraint: (Hungarian algorithm) Consistent ? - based on Crammer & Singer multiclass SVM formulation - is not consistent for distribution with no majority label (Liu, 2007) 6
Adversarial Bipartite Matching (our approach) Seek a predictor that robustly minimize Hamming loss against the worst-case permutation mixture probability Predictor: - makes a probabilistic prediction ΰ· π(ΰ· π|π¦) - aims to minimize the loss - is pitted with an adversary instead of the empirical distribution Adversary: - makes a probabilistic prediction ΰ· π(ΰ· π|π¦) - aims to maximize the loss - constrained to select probability that match the statistics of empirical distribution ( ΰ·¨ π ) π via moment matching on the features π π¦, π = Ο π=1 π π (π¦, π π ) 7
Adversarial Bipartite Matching | Dual Lagrangian Dual Formulation of the Adversarial Bipartite Matching term (methods of Lagrange multipliers, Von Neumann & Sion minimax duality) where π is the dual variable for moment matching constraints Hamming loss Augmented Hamming loss matrix for π = 3 permutations size: π! Γ π! Intractable for modestly-sized π 8
Efficient Algorithms 1 Double Oracle Method (Constraint Generations) Marginal Distribution Formulation 2 9
Double Oracle Method Based on the observation: equilibrium is usually supported by small number of permutations Iterative procedure: π =123 π =213 π =312 ΰ· ΰ· ΰ· π =123 0+ π 123 2+ π 213 3+ π 213 Adversaryβs best response: ΰ· π =213 ΰ· Adversaryβs best response: π =312 ΰ· π =213 2+ π 123 0+ π 213 2+ π 213 ΰ· π =312 3+ π 123 2+ π 213 0+ π 213 ΰ· Predictorβs best response: π =213 ΰ· - no formal polynomial bound is known Predictorβs best response: π =312 ΰ· - runtime: cannot be characterized as polynomial 10
Marginal Distribution Formulation Marginal Distribution Matrices: Predictor Adversary π = π = π π,π = ΰ· π π,π = ΰ· π(ΰ· π π = π) π ( ΰ· π π = π) Birkhoff β Von Neumann theorem: 123 321 convex polytope whose points are doubly stochastic matrix 312 132 reduce the space of optimization: from π(π!) to π(π 2 ) 231 213 11
Marginal Formulation | Optimization Optimization : add regularization and smoothing penalty Techniques: - Outer (Q) : * projected Quasi-Newton (Schmidt, et.al., 2009) * projection to doubly-stochastic matrix - Inner ( π ) : closed-form solution - Inner (P) : projection to doubly-stochastic matrix - Projection to doubly-stochastic matrix : ADMM 12
Consistency Empirical Risk Perspective of Adversarial Bipartite Matching Consistency: our method also minimize the Hamming loss in ideal case. arg-max of π is in the Bayes optimal responses 13
Experiment Setup Application: Video Tracking Empirical runtime (until convergence) Adv. Marginal Form.: grows (roughly) quadratically in π 1.0 1.0 1.3 1.2 1.5 1.4 CRF: impractical 2.5 4.2 2.8 5.0 even for π = 20 (Petterson et. al., 2009) relative: 12=1.0 relative: 1.96=1.0 14
Experiment Results 6 pairs of dataset significantly outperforms SSVM 2 pairs of dataset competitive with SSVM Adv. Double Oracle: small number of permutations 15
Conclusions Perform well ? Efficient? Consistent? ? Exponential Family Random Field (Petterson et. al., 2009; Volkovs & Zemel, 2012) ?? Maximum Margin (Tsochantaridis et. al., 2005) Adversarial Bipartite Matching (our approach) 16
THANK YOU 17
Recommend
More recommend