On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3 , Ruitong Huang 3 and Yaoliang Yu 1,2 University of Waterloo 1 Vector Institute 2 Borealis AI 3 Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 1 / 13
The success of generative adversarial networks (GANs) (Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13
The success of generative adversarial networks (GANs) (Arjovsky et al. 2017; Goodfellow et al. 2014; Li et al. 2017; Miyato et al. 2018) But... what if the training data is noisy? Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 2 / 13
H N ( θ, I p ) X 1 , X 2 , · · · X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ H (Huber 1964) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13
H N ( θ, I p ) X 1 , X 2 , · · · X n ∼ (1 − ǫ ) N ( θ, I p ) + ǫ H (Huber 1964) Compute an estimator ˆ θ ≈ θ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 3 / 13
Goal: small RMSE sup H E � ˆ θ − θ � in the worst case Sample average: infinite error in the worst case Coordinate-wise median: √ p ǫ error Tukey’s median (Tukey 1975): optimal error ǫ , but NP-hard Statistically optimal & computationally feasible estimators (Diakonikolas et al. 2016; Lai et al. 2016) Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 4 / 13
Robust Mean Estimation via GANs ˆ E data [ T ( X )] − E N ( η, I p ) [ s ( T ( Y ))] θ := argmin sup T ∈T η N ( η, I p ) is the generator T is the discriminator function class Which discriminator class T guarantees small estimation error? Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 5 / 13
f -GAN (Nowozin et al. 2016) Discriminator is an one-hidden-layer network � � l � � � w i σ ( u ⊤ T = g i x + b i ) : � w � 1 ≤ κ i =1 Theorem ( f -GAN) Under mild assumptions on the activations, we have � p � ˆ θ n − θ � � n ∨ ǫ with high probability. Generalizing results of Gao et al. (2019) on TV-GAN and JS-GAN Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 6 / 13
MMD-GAN (Dziugaite et al. 2015; Li et al. 2017) T = { f ∈ H k : � f � H k ≤ 1 } Discriminator is a unit ball in RKHS: � − � x − y � 2 � We focus on the Gaussian kernel: k ( x , y ) = exp 2 σ 2 Theorem With appropriate tuning of the bandwidth ( σ = √ p), � p n ∨ √ p ǫ � ˆ θ n − θ � � Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13
MMD-GAN (Dziugaite et al. 2015; Li et al. 2017) T = { f ∈ H k : � f � H k ≤ 1 } Discriminator is a unit ball in RKHS: � − � x − y � 2 � We focus on the Gaussian kernel: k ( x , y ) = exp 2 σ 2 Theorem With appropriate tuning of the bandwidth ( σ = √ p), � p n ∨ √ p ǫ � ˆ θ n − θ � � Theorem For any bandwidth σ , there exists a contamination H such that θ − θ � � √ p ǫ � ˆ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 7 / 13
Simulation σ = 5 σ = 15 1.1 2.5 σ = 7.5 σ = 20 σ = 10 1.0 2.0 0.9 θ − θ ‖ θ − θ ‖ 0.8 1.5 0.7 ‖ ̂ ‖ ̂ 1.0 0.6 0.5 0.5 0.4 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3 4 5 6 7 8 9 10 ‖ ̃ θ − θ ‖/ √ p √ p (a) different σ and δ ˜ θ in 100 dimension (b) different dimension p with σ = √ p Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 8 / 13
Wasserstein GAN (Arjovsky et al. 2017) Discriminator is 1-Lipschitz functions: T = { f : | f ( x ) − f ( y ) | ≤ � x − y � , ∀ x , y ∈ X} . Theorem In one dimension, estimation error is bounded: | ˆ θ − θ | ≍ ǫ θ − θ � ≍ √ p ǫ empirically... In high dimensions, � ˆ Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 9 / 13
Minimizing Wasserstein distance directly by Sinkhorn divergence 1.4 1.2 1.0 θ − θ ‖ 0.8 ‖ ̂ λ = 0.1 0.6 λ = 0.05 0.4 λ = 0.01 0.2 2 3 4 5 6 7 8 9 10 √ p (a) WGAN in 1 dimension (b) WGAN in p dimension Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 10 / 13
Extension of f -GAN Unknown covariance Sparse mean estimation ◮ θ has at most s nonzero entries ◮ Sparse constraints on both discriminator and generator l � � � � � w i σ ( u ⊤ Discriminator: T = g i x + b i ) : � w � 1 ≤ κ, � u � 0 ≤ 2 s i =1 � � Generator: N ( η, I p ) : � η � 0 ≤ s Theorem � s log ep � ˆ s θ n − θ � ≍ ∨ ǫ n Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 11 / 13
Simulation (b) sparse vs. nonsparse (a) varying sparsity s Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 12 / 13
Summary Characterize minimax optimality of several GAN formulations – Complete characterization of the discriminator function class – Computational complexity of GANs Wu, Ding, Huang & Yu GANs for Robust Mean Estimation 13 / 13
Recommend
More recommend