Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning Tuo Zhao Georgia Tech, Jun. 26. 2019 Joint work with Haoming Jiang, Minshuo Chen (Georgia Tech), Bo Dai (Google Brain), Zhaoran Wang (Northwestern U) and others.
Background
VALSE Webinar, Jun. 26 2019 Minimax Machine Learning Conventional Empirical Risk Minimization : Given training data z 1 , ..., z n , we minimize an empirical risk function, n � 1 min f ( z i ; θ ) . n θ i =1 Minimax Formulation : We solve a minimax problem, n � 1 min max f ( z i ; θ, φ ) . n θ φ i =1 More Flexible . Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 2/38
VALSE Webinar, Jun. 26 2019 Minimax Machine Learning Conventional Empirical Risk Minimization : Given training data z 1 , ..., z n , we minimize an empirical risk function, n � 1 min f ( z i ; θ ) . n θ i =1 Minimax Formulation : We solve a minimax problem, n � 1 min max f ( z i ; θ, φ ) . n θ φ i =1 More Flexible . Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 2/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Robust Deep Learning Neural Networks are vulnerable to adversarial examples (Goodfellow et al. 2014, Madry et al. 2017). Clean Sample Perturbation Adversarial Example Adversarial Perturbation: max δ i ∈B ℓ ( f ( x i + δ i ; θ ) , y i ) , � n 1 Adversarial Training: min max δ i ∈B ℓ ( f ( x i + δ i ; θ ) , y i ) , n θ i =1 where δ i ∈ B denotes the imperceptible perturbation. Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 3/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Robust Deep Learning Neural Networks are vulnerable to adversarial examples (Goodfellow et al. 2014, Madry et al. 2017). Clean Sample Perturbation Adversarial Example Adversarial Perturbation: max δ i ∈B ℓ ( f ( x i + δ i ; θ ) , y i ) , � n 1 Adversarial Training: min max δ i ∈B ℓ ( f ( x i + δ i ; θ ) , y i ) , n θ i =1 where δ i ∈ B denotes the imperceptible perturbation. Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 3/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Image Generation Brock et al. (2019) All are fake! Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 4/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Unsupervised Learning Generative Adversarial Network : Goodfellow et al. (2014), Arjovsky et al. (2017), Miyato et al. (2018), Brock et al. (2019) n � 1 min max φ ( A ( D W ( x i ))) + E x ∼D Gθ [ φ (1 − A ( D W ( x )))] . n θ W i =1 D W : Discriminator; G θ : Generator; φ : log(˙ ) and A : Softmax. Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 5/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Unsupervised Learning Generative Adversarial Network : Goodfellow et al. (2014), Arjovsky et al. (2017), Miyato et al. (2018), Brock et al. (2019) n � 1 min max φ ( A ( D W ( x i ))) + E x ∼D Gθ [ φ (1 − A ( D W ( x )))] . n θ W i =1 D W : Discriminator; G θ : Generator; φ : log(˙ ) and A : Softmax. Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 5/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Reinforcement Learning Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 6/38
VALSE Webinar, Jun. 26 2019 Motivating Application: Reinforcement Learning Minimax Formulation : Given M = ( A , A , P, R, γ ) , we solve L ( π, V ; ν ) = 2 E s,a,s ′ [ ν ( s, a )( R ( s, a ) + γV ( s ′ ) min π,V max ν − λ log( π ( a | s ))] − E s,a,s ′ ν 2 ( s, a ) , where s denotes the state, a denotes the action, and Policy: π : S → P ( A ) , Value: V : S → R , Reward: R : S × A → R , Axillary Dual: ν : S × A → R . The policy π is parameterized as a neural network, where as ν is parameterized as a reproducing kernel function (Dai et al. 2018). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 7/38
VALSE Webinar, Jun. 26 2019 Successes of Minimax Machine Learning Adversarial Robust Learning Unsupervised Learning Learning with Constraints Reinforcement Learning Domain Adaptation Generative Adversarial Imitation Learning . . . = ⇒ Identify the fundamental hardness of minimax machine learning and make optimization easier. Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 8/38
Challenges
VALSE Webinar, Jun. 26 2019 Minimax Optimization General Formula : min x ∈X max y ∈Y f ( x, y ) , X ⊂ R d , Y ⊂ R p , f is some continuous function. Two Stage Optimization: Stage 1: g ( x ) = max y ∈Y f ( x, y ) , Stage 2: min x ∈X g ( x ) , Solve Stage 2 using gradient descent. Limitation: A global maximum of max y ∈Y f ( x, y ) needs to be obtained for evaluating ∇ g ( x ) (Envelope Theorem, Afriat et al. (1971)). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 9/38
VALSE Webinar, Jun. 26 2019 Minimax Optimization General Formula : min x ∈X max y ∈Y f ( x, y ) , X ⊂ R d , Y ⊂ R p , f is some continuous function. Two Stage Optimization: Stage 1: g ( x ) = max y ∈Y f ( x, y ) , Stage 2: min x ∈X g ( x ) , Solve Stage 2 using gradient descent. Limitation: A global maximum of max y ∈Y f ( x, y ) needs to be obtained for evaluating ∇ g ( x ) (Envelope Theorem, Afriat et al. (1971)). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 9/38
VALSE Webinar, Jun. 26 2019 Minimax Optimization General Formula : min x ∈X max y ∈Y f ( x, y ) , X ⊂ R d , Y ⊂ R p , f is some continuous function. Two Stage Optimization: Stage 1: g ( x ) = max y ∈Y f ( x, y ) , Stage 2: min x ∈X g ( x ) , Solve Stage 2 using gradient descent. Limitation: A global maximum of max y ∈Y f ( x, y ) needs to be obtained for evaluating ∇ g ( x ) (Envelope Theorem, Afriat et al. (1971)). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 9/38
VALSE Webinar, Jun. 26 2019 Existing Literature Bilinear Saddle Point Problem: � � min p ( x ) + max y ∈Y � Ax, y � − q ( y ) . x ∈X X ⊂ R d and Y ⊂ R p : closed convex domain; A ∈ R p × d ; p ( · ) and q ( · ) : convex functions satisfying certain assumptions. Nice Structure: Convex in x and Concave in y ; Bilinear interaction (can be slightly relaxed). Algorithms with Theoretical Guarantees: Primal-Dual Algorihtm, Mirror-Prox Algorithm · · · (Nemirovski 2005, Chen et al. 2014, Dang et al. 2015). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 10/38
VALSE Webinar, Jun. 26 2019 Existing Literature Bilinear Saddle Point Problem: � � min p ( x ) + max y ∈Y � Ax, y � − q ( y ) . x ∈X X ⊂ R d and Y ⊂ R p : closed convex domain; A ∈ R p × d ; p ( · ) and q ( · ) : convex functions satisfying certain assumptions. Nice Structure: Convex in x and Concave in y ; Bilinear interaction (can be slightly relaxed). Algorithms with Theoretical Guarantees: Primal-Dual Algorihtm, Mirror-Prox Algorithm · · · (Nemirovski 2005, Chen et al. 2014, Dang et al. 2015). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 10/38
VALSE Webinar, Jun. 26 2019 Existing Literature Bilinear Saddle Point Problem: � � min p ( x ) + max y ∈Y � Ax, y � − q ( y ) . x ∈X X ⊂ R d and Y ⊂ R p : closed convex domain; A ∈ R p × d ; p ( · ) and q ( · ) : convex functions satisfying certain assumptions. Nice Structure: Convex in x and Concave in y ; Bilinear interaction (can be slightly relaxed). Algorithms with Theoretical Guarantees: Primal-Dual Algorihtm, Mirror-Prox Algorithm · · · (Nemirovski 2005, Chen et al. 2014, Dang et al. 2015). Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 10/38
VALSE Webinar, Jun. 26 2019 Challenges: Nonconcavity of Inner Maximization � � Recall Stage 2 : min g ( x ) := max y ∈Y f ( x, y ) . x ∈X Why Fail to Converge? : � y � = arg max y f ( x, y ) may even lead to � ∂g ( x ) � ∂x , ∂f ( x, � y ) ≪ 0 . ∂x Noisy Gradient Limit Cycles φ θ θ Minimization Minmax Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 11/38
VALSE Webinar, Jun. 26 2019 Challenges: Nonconcavity of Inner Maximization � � Recall Stage 2 : min g ( x ) := max y ∈Y f ( x, y ) . x ∈X Why Fail to Converge? : � y � = arg max y f ( x, y ) may even lead to � ∂g ( x ) � ∂x , ∂f ( x, � y ) ≪ 0 . ∂x Noisy Gradient Limit Cycles φ θ θ Minimization Minmax Tuo Zhao — Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning 11/38
Recommend
More recommend