Robust hypothesis test using Wasserstein uncertainty sets Yao Xie Georgia Institute of Technology Joint work with Rui Gao, Liyan Xie, Huan Xu
Cl Classification on with • Anomaly detection: • Health care: many unba unbalance nced da d data self-driving car, negative samples, network intrusion not many positive detection, credit fraud samples detection, online fewer data for detection with fewer several classes samples Self-driving car Imbalanced classification Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 )
Non-parametric hypothesis normal test with unbalanced and limited data - empirical distribution may not have common support abnormal - no possible to use likelihood ratio : optimal by well-known Neyman-Pearson.
Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hypothesis test using Wasserstein unce certainty sets , we would like to decide œ • Test two hypothesis H 1 : Ê ≥ P 1 , P 1 œ P 1 H 2 : Ê ≥ P 2 , P 2 œ P 2 • Wasserstein uncertainty sets for distributional robustness sets: Wasserstein metrics can normal deal with distributions 𝑜 2 with different support, 𝑅 2 𝑜 1 𝑅 1 better than K-L abnormal 𝒬 2 𝒬 1 divergence • Goal: find optimal detector, minimizes worst-case type-I + type-II errors
Ma Main r results Distributionally robust Computationally efficient nearly-optimal detector • Theorem : General distributionally • Tractable convex reformulation robust detector has nearly-optimal • Complexity independent of detector has risk bounded by small dimensionality, scalable to large 1 2) , th constant dataset ( ≠ „ Æ Â ( ‘ ) Æ ‘ objective val 1 n 1 + n 2 p l X ( p l 1 + p l � � max 2 )  1 0.9 p l 1 + p l p 1 ,p 2 œ R n 1+ n 2 2 0.8 + l =1 “ 1 , “ 2 œ R ( n 1+ n 2) ◊ R ( n 1+ n 2) + + 0.7 n 1 + n 2 n 1 + n 2 � Ê l ≠ Ê m � X X 0.6 “ lm � � Æ ◊ k , k = 1 , 2 , subject to k 0.5 m =1 l =1 Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) n 1 + n 2 0.4 detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) X “ lm = Q n k k ( Ê l ) , 1 Æ l Æ n 1 + n 2 , k = 1 , 2 , 0.3 k m =1 0.2 n 1 + n 2 0.1 X “ lm = p m k , 1 Æ m Æ n 1 + n 2 , k = 1 , 2 k 0 0 0.1 0.2 0.3 0.4 0.5 l =1 Statistical interpretation Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) Hotelling control chart detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 )
Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) St Statistical inter erpretations • Minimizes divergence between two distributions within two Wasserstein balls, centered around empirical distributions, and have common support on ! " + ! $ data points 𝑜 2 𝑅 2 𝑜 1 𝑅 1 𝒬 2 𝒬 1
Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 Human 2 ln( p $ 1 =p $ 2 ) activity Credit: CSIRO Research optimal detector 2.5 0.4 Pre-change Post-change Hotelling control chart detector ? $ = sgn ( p $ detection 1 ! p $ 2 ) 0.2 detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) 2 0 average detection delay 4520 4554 4589 4619 Hotelling control chart 1.5 Pre-change Post-change 10 5 1 0 4520 4554 4589 4619 raw data 1000 Pre-change Post-change 0.5 500 0 arXiv 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 4520 4554 4589 4619 type-I error sample index (a) (b) Figure: Jogging vs. Walking, the average is taken over 100 sequences of data. s
Recommend
More recommend