adversarial bipartite matching
play

Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, - PowerPoint PPT Presentation

Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1 Bipartite Matching Tasks 1 1 2 2 = [4, 3, 1, 2] 3 3 4 4 B A


  1. Efficient and Consistent Adversarial Bipartite Matching Rizal Fathony* ,# , Sima Behpour*, Xinhua Zhang, Brian D. Ziebart *) equal contribution # ) presenter 1

  2. Bipartite Matching Tasks 1 1 2 2 𝜌 = [4, 3, 1, 2] 3 3 4 4 B A Maximum weighted bipartite matching: Machine learning task: Learn the appropriate weights πœ” 𝑗 (β‹…) 2

  3. Learning Bipartite Matching | Applications Word alignment (Taskar et. al., 2005; Pado & Lapta, 2006; Mac-Cartney et. al., 2008) 1 natΓΌrlich ist das haus klein of course the house is small Correspondence between images (Belongie et. al., 2002; Dellaert et. al., 2003) 2 Learning to rank documents (Dwork et. al., 2001; Le & Smola, 2007) 3 1 2 3 4 3

  4. Desiderata for a Predictor Learning objective: seek pairwise potentials that most compatible with training data Challenge: loss functions (e.g. Hamming loss): non-continuous & non convex Desiderata for predictor: Efficiency 1 runtime: (low degree) polynomial time Consistency 2 must also minimize Hamming loss under ideal condition (given the true distribution and fully expressive model parameters) 4

  5. Exponential Family Random Field Approach (Petterson et. al., 2009; Volkovs & Zemel, 2012) Probabilistic model: #P-hard (Valiant, 1979) Consistent? produce Bayes optimal prediction in an ideal condition Efficient ? normalization term π‘Ž πœ” involves matrix permanent computation impractical even for modestly-size π‘œ = 20 5

  6. Maximum Margin Approach (Tsochantaridis et. al., 2005) Max-margin model: Efficient? polynomial algorithm for computing maximum violated constraint: (Hungarian algorithm) Consistent ? - based on Crammer & Singer multiclass SVM formulation - is not consistent for distribution with no majority label (Liu, 2007) 6

  7. Adversarial Bipartite Matching (our approach) Seek a predictor that robustly minimize Hamming loss against the worst-case permutation mixture probability Predictor: - makes a probabilistic prediction ΰ·  𝑄(ො 𝜌|𝑦) - aims to minimize the loss - is pitted with an adversary instead of the empirical distribution Adversary: - makes a probabilistic prediction ෘ 𝑄(ΰ·” 𝜌|𝑦) - aims to maximize the loss - constrained to select probability that match the statistics of empirical distribution ( ΰ·¨ 𝑄 ) π‘œ via moment matching on the features 𝜚 𝑦, 𝜌 = Οƒ 𝑗=1 𝜚 𝑗 (𝑦, 𝜌 𝑗 ) 7

  8. Adversarial Bipartite Matching | Dual Lagrangian Dual Formulation of the Adversarial Bipartite Matching term (methods of Lagrange multipliers, Von Neumann & Sion minimax duality) where πœ„ is the dual variable for moment matching constraints Hamming loss Augmented Hamming loss matrix for π‘œ = 3 permutations size: π‘œ! Γ— π‘œ! Intractable for modestly-sized π‘œ 8

  9. Efficient Algorithms 1 Double Oracle Method (Constraint Generations) Marginal Distribution Formulation 2 9

  10. Double Oracle Method Based on the observation: equilibrium is usually supported by small number of permutations Iterative procedure: 𝜌 =123 𝜌 =213 𝜌 =312 ΰ·” ΰ·” ΰ·” 𝜌 =123 0+ πœ€ 123 2+ πœ€ 213 3+ πœ€ 213 Adversary’s best response: ො 𝜌 =213 ΰ·” Adversary’s best response: 𝜌 =312 ΰ·” 𝜌 =213 2+ πœ€ 123 0+ πœ€ 213 2+ πœ€ 213 ො 𝜌 =312 3+ πœ€ 123 2+ πœ€ 213 0+ πœ€ 213 ො Predictor’s best response: 𝜌 =213 ො - no formal polynomial bound is known Predictor’s best response: 𝜌 =312 ො - runtime: cannot be characterized as polynomial 10

  11. Marginal Distribution Formulation Marginal Distribution Matrices: Predictor Adversary 𝐐 = 𝐑 = π‘ž 𝑗,π‘˜ = ΰ·  π‘Ÿ 𝑗,π‘˜ = ෘ 𝑄(ො 𝜌 𝑗 = π‘˜) 𝑄 ( ΰ·• 𝜌 𝑗 = π‘˜) Birkhoff – Von Neumann theorem: 123 321 convex polytope whose points are doubly stochastic matrix 312 132 reduce the space of optimization: from 𝑃(π‘œ!) to 𝑃(π‘œ 2 ) 231 213 11

  12. Marginal Formulation | Optimization Optimization : add regularization and smoothing penalty Techniques: - Outer (Q) : * projected Quasi-Newton (Schmidt, et.al., 2009) * projection to doubly-stochastic matrix - Inner ( πœ„ ) : closed-form solution - Inner (P) : projection to doubly-stochastic matrix - Projection to doubly-stochastic matrix : ADMM 12

  13. Consistency Empirical Risk Perspective of Adversarial Bipartite Matching Consistency: our method also minimize the Hamming loss in ideal case. arg-max of 𝑔 is in the Bayes optimal responses 13

  14. Experiment Setup Application: Video Tracking Empirical runtime (until convergence) Adv. Marginal Form.: grows (roughly) quadratically in π‘œ 1.0 1.0 1.3 1.2 1.5 1.4 CRF: impractical 2.5 4.2 2.8 5.0 even for π‘œ = 20 (Petterson et. al., 2009) relative: 12=1.0 relative: 1.96=1.0 14

  15. Experiment Results 6 pairs of dataset significantly outperforms SSVM 2 pairs of dataset competitive with SSVM Adv. Double Oracle: small number of permutations 15

  16. Conclusions Perform well ? Efficient? Consistent? ? Exponential Family Random Field (Petterson et. al., 2009; Volkovs & Zemel, 2012) ?? Maximum Margin (Tsochantaridis et. al., 2005) Adversarial Bipartite Matching (our approach) 16

  17. THANK YOU 17

Recommend


More recommend