automatic outlier detection a bayesian approach
play

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, - PowerPoint PPT Presentation

Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California Aaron DSouza, Google, Inc. Stefan Schaal, University of Southern California ICRA 2007 April 12, 2007 Outline Motivation Past &


  1. Automatic Outlier Detection: A Bayesian Approach Jo-Anne Ting, University of Southern California Aaron D’Souza, Google, Inc. Stefan Schaal, University of Southern California ICRA 2007 April 12, 2007

  2. Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 2

  3. Motivation • Real-world sensor data is susceptible to outliers – E.g., motion capture (MOCAP) data of a robotic dog J. Ting 3

  4. Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 4

  5. Past & Related Work • Current methods for outlier detection may: – Require parameter tuning (i.e. an optimal threshold) – Require sampling (e.g. active sampling, Abe et al., 2006) or the setting of certain parameters, e.g., k in k-means clustering (MacQueen, 1967) – Assume an underlying data structure (e.g. mixture models, Fox et al., 1999) – Adopt a weighted linear regression model, but model the weights with some heuristic function (e.g., robust least squares, Hoaglin, 1983) J. Ting 5

  6. Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 6

  7. Bayesian Regression for Automatic Outlier Detection • Consider linear regression: y i = b T x i + � y i • We can modify the above to get a weighted linear regression model (Gelman et al., 1995): y i ~ Normal b T x i , � 2 � � � � w i � � ( ) b ~ Normal b 0 , � 2 � b 0 Except now: ( ) w i ~ Gamma a w i , b w i J. Ting 7

  8. Bayesian Regression for Automatic Outlier Detection • This Bayesian treatment of weighted linear regression: • Is suitable for real-time outlier detection • Requires no model assumptions • Requires no parameter tuning J. Ting 8

  9. Bayesian Regression for Automatic Outlier Detection • Our goal is to infer the posterior distributions of b and w • We can treat this as an EM problem (Dempster et al., 1977) and maximize the incomplete log likelihood: log p ( y | X ) by maximizing the expected complete log likelihood: E [log p ( y , b , w | X )] J. Ting 9

  10. Bayesian Regression for Automatic Outlier Detection • In the E-step, we need to calculate: E Q ( b , w ) [log p ( y , b , w | X )] but since the true posterior over all hidden variables is analytically intractable, we make a factorial variational approximation (Hinton & van Camp 1993, Ghahramani & Beal, 2000): Q ( b , w ) = Q ( b ) Q ( w ) J. Ting 10

  11. Bayesian Regression for Automatic Outlier Detection Reminder: • EM Update Equations (batch version): y i ~ Normal b T x i , � 2 � � � � � � w i E - step : � 1 � � � 1 + N � � b = � b 0 T � � w i x i x i If prediction error is � � Point is i = 1 very large, E[w i ] goes downweighted � � N to 0 � b = � b � b 0 � 1 b 0 + � � w i y i x i � � i = 1 a w i ,0 + 0.5 w i = ( ) 1 + 1 T x i 2 b w i ,0 + 2 y i � b T � b x i x i 2 � 2 M - step : ( ) + w i x i N 2 = 1 � T x i � � � y i � b T � b x i w i � � N i = 1 J. Ting 11

  12. Bayesian Regression for Automatic Outlier Detection • EM Update Equations (incremental version): E - step : Sufficient statistics � 1 � � N � 1 + are exponentially � � b = � b 0 N k = 1 + � N k � 1 � k wxx T � � discounted by λ , � � wxx T = w k x k x k i = 1 T + � � k � 1 0 ≤ λ ≤ 1 (e.g., Ljung � k wxx T � � N � b = � b � b 0 � 1 b 0 + � k & Soderstrom, 1983) wyx wyx = w k y k x k + � � k � 1 � � � k � � wyx i = 1 wy 2 = w k y k 2 + � � k � 1 a w i ,0 + 0.5 � k wy 2 w i = ( ) 1 + 1 T x i 2 b w i ,0 + 2 � 2 y i � b T � b x i 2 x i M - step : { } wy 2 � 2 � k wxx T � b T � k N � 2 = 1 wyx + b wxx T b + 1 T diag � k � � � � k � � N k i = 1 J. Ting 12

  13. Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 13

  14. Results: Synthetic Data • Given noisy data (+outliers) from a linear regression problem: • 5 input dimensions • 1000 samples • SNR = 10 • 20% outliers • outliers are 3 σ from output mean J. Ting 14

  15. Results: Synthetic Data Available in Batch Form Average Normalized Mean Squared Prediction Error as a Function of How Far Outliers are from Inliers Lowest prediction error Distance of outliers from mean is at least… +3 σ +2 σ + σ Algorithm Thresholding (optimally tuned) 0.0903 0.0503 0.0232 Mixture model 0.1327 0.0688 0.0286 Robust Least Squares 0.1890 0.1518 0.0880 Robust Regression (Faul & 0.1320 0.0683 0.0282 Tipping 2001) Bayesian weighted regression 0.0273 0.0270 0.0210 Data: Globally linear data with 5 input dimensions evaluated in batch form, averaged over 10 trials (SNR = 10, σ is the standard deviation of the true conditional output mean) J. Ting 15

  16. Results: Synthetic Data Available Incrementally Prediction Error Over Time with Outliers at least 2 σ away ( λ =0.999) Lowest prediction error J. Ting 16

  17. Results: Synthetic Data Available Incrementally Prediction Error Over Time with Outliers at least 3 σ away ( λ =0.999) Lowest prediction error J. Ting 17

  18. Results: Robotic Orientation Data • Offset between MOCAP data & IMU data for LittleDog: J. Ting 18

  19. Results: Predicted Output on LittleDog MOCAP Data J. Ting 19

  20. Outline • Motivation • Past & related work • Bayesian regression for automatic outlier detection – Batch version – Incremental version • Results – Synthetic data – Robotic data • Conclusions J. Ting 20

  21. Conclusions • We have an algorithm that: – Automatically detects outliers in real-time – Requires no user interference, parameter tuning or sampling – Performs on par with and even exceeds standard outlier detection methods • Extensions to the Kalman filter and other filters J. Ting 21

Recommend


More recommend