robust hypothesis test using wasserstein uncertainty sets
play

Robust hypothesis test using Wasserstein uncertainty sets Yao Xie - PowerPoint PPT Presentation

Robust hypothesis test using Wasserstein uncertainty sets Yao Xie Georgia Institute of Technology Joint work with Rui Gao, Liyan Xie, Huan Xu Cl Classification on with Anomaly detection: Health care: many unba unbalance nced da d


  1. Robust hypothesis test using Wasserstein uncertainty sets Yao Xie Georgia Institute of Technology Joint work with Rui Gao, Liyan Xie, Huan Xu

  2. Cl Classification on with • Anomaly detection: • Health care: many unba unbalance nced da d data self-driving car, negative samples, network intrusion not many positive detection, credit fraud samples detection, online fewer data for detection with fewer several classes samples Self-driving car Imbalanced classification Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 )

  3. Non-parametric hypothesis normal test with unbalanced and limited data - empirical distribution may not have common support abnormal - no possible to use likelihood ratio : optimal by well-known Neyman-Pearson.

  4. Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hypothesis test using Wasserstein unce certainty sets , we would like to decide œ • Test two hypothesis H 1 : Ê ≥ P 1 , P 1 œ P 1 H 2 : Ê ≥ P 2 , P 2 œ P 2 • Wasserstein uncertainty sets for distributional robustness sets: Wasserstein metrics can normal deal with distributions 𝑜 2 with different support, 𝑅 2 𝑜 1 𝑅 1 better than K-L abnormal 𝒬 2 𝒬 1 divergence • Goal: find optimal detector, minimizes worst-case type-I + type-II errors

  5. Ma Main r results Distributionally robust Computationally efficient nearly-optimal detector • Theorem : General distributionally • Tractable convex reformulation robust detector has nearly-optimal • Complexity independent of detector has risk bounded by small dimensionality, scalable to large 1 2) , th constant dataset ( ≠ „ Æ Â ( ‘ ) Æ ‘ objective val 1 n 1 + n 2 p l X ( p l 1 + p l � � max 2 )  1 0.9 p l 1 + p l p 1 ,p 2 œ R n 1+ n 2 2 0.8 + l =1 “ 1 , “ 2 œ R ( n 1+ n 2) ◊ R ( n 1+ n 2) + + 0.7 n 1 + n 2 n 1 + n 2 � Ê l ≠ Ê m � X X 0.6 “ lm � � Æ ◊ k , k = 1 , 2 , subject to k 0.5 m =1 l =1 Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) n 1 + n 2 0.4 detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) X “ lm = Q n k k ( Ê l ) , 1 Æ l Æ n 1 + n 2 , k = 1 , 2 , 0.3 k m =1 0.2 n 1 + n 2 0.1 X “ lm = p m k , 1 Æ m Æ n 1 + n 2 , k = 1 , 2 k 0 0 0.1 0.2 0.3 0.4 0.5 l =1 Statistical interpretation Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) Hotelling control chart detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 )

  6. Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) St Statistical inter erpretations • Minimizes divergence between two distributions within two Wasserstein balls, centered around empirical distributions, and have common support on ! " + ! $ data points 𝑜 2 𝑅 2 𝑜 1 𝑅 1 𝒬 2 𝒬 1

  7. Hotelling control chart detector ? $ = sgn ( p $ 1 ! p $ 2 ) detector ? $ = 1 Human 2 ln( p $ 1 =p $ 2 ) activity Credit: CSIRO Research optimal detector 2.5 0.4 Pre-change Post-change Hotelling control chart detector ? $ = sgn ( p $ detection 1 ! p $ 2 ) 0.2 detector ? $ = 1 2 ln( p $ 1 =p $ 2 ) 2 0 average detection delay 4520 4554 4589 4619 Hotelling control chart 1.5 Pre-change Post-change 10 5 1 0 4520 4554 4589 4619 raw data 1000 Pre-change Post-change 0.5 500 0 arXiv 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 4520 4554 4589 4619 type-I error sample index (a) (b) Figure: Jogging vs. Walking, the average is taken over 100 sequences of data. s

Recommend


More recommend