Outlier Detection Methods Paul van Leeuwen 5 December 2019 - PowerPoint PPT Presentation

Introduction How Does LOF Work? An Alternative to LOF Outlier Detection Methods Paul van Leeuwen 5 December 2019

Introduction How Does LOF Work? An Alternative to LOF Introduction How Does LOF Work? An Alternative to LOF

Introduction How Does LOF Work? An Alternative to LOF Introduction

Introduction How Does LOF Work? An Alternative to LOF Traditional Methods • (Hawkins-Outlier, 1980) ‘An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism.’ • Traditional outlier detection methods can be categorised into the following approaches: • distribution-based: easy to visualise but a multivariate probability distribution needs to be assigned to all variables, which is unknown in our case; • depth-based: outliers are assumed to be located at the boundaries of the data and computational demanding for four or more dimensions, which is applicable to our case; • clustering: methods are optimised to cluster the data, not to detect outliers; • distance-based: problematic when we have sparse and dense data regions, which could easily be the case for high levels of the LOB.

Introduction How Does LOF Work? An Alternative to LOF A Novel Approach • M. Breunig, et al. introduced a new approach: Local Outlier Factor (LOF). • This is a density-based approach driven by the data. • Data points that are distant relative to eachother are considered to be more outlying. • Issues above are more or less solved, although we still need to properly define the parameters. • In addition, the variables need to be continuous and outliers in low density regions are still hard to detect. • This inspired variants, worth to be investigated: • Connectivity-based Outlier Factor (COF) by Tang et al. 2002; • Influenced Outlierness (INFLO) by Jin et al. 2006; • Local Outlier Correlation Integral (LOCI) by Papadimitriou et al. 2003; • . . . • A great overview of these methods are given in https://archive.siam.org/meetings/sdm10/tutorial3.pdf.

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work?

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? 3.0 2.5 2.0 y 1.5 1.0 0.5 0.0 0.2 0.4 0.6 0.8 x

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? • Without any knowledge of the probability distribution we could have assigned to the data, the point (0.5, 3) is considered to be an outlier. 3.0 2.5 2.0 1.5 y 1.0 0.5 0.0 0.2 0.4 0.6 0.8 x

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? • However, suppose that a priori we know that the data points ( x i , y i ) for i = 1 , . . . , 10 follow the pattern y i = − 4 . 04 + 23 . 5 x i − 20 x 2 i + ε i , ε i ∼ N (0 , 0 . 933) • A second-order polynomial is fitted on the data points leaving the ones out that meet the conditions 0 . 3 < x i < 0 . 8 and y i < 1 . 5. • Then the point considered to be an outlier before is not an outlier anymore, but the points that are left out are!

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? 3.0 2.5 2.0 y 1.5 1.0 0.5 0.0 0.2 0.4 0.6 0.8 x

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? • However, in our case we do not have that level of knowledge of the data-generating process of y i . • Alternatively, make use of the relative densities. • The figure below is retrieved from M. Breunig, et al.

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? • The traditional methods have a hard time dealing with different densities. • For example, the algorithms from the distance-based approach cannot identify o 1 as an outlier while the points in the cluster C 2 are not. • Make use of the Eucledian distance. • Is standardisation necessary? • For each data point investigate how dense the neighbourhood is for each of its k neighbours. • First, calculate the reachability distance of all data points. • Second, calculate the local reachability of each data point. • Calculate the inverse of the average of reachability distances of its k nearest neighbours. • Finally, the LOF of a data point is the local reachability of its k nearest neighbours relative to the local reachability of that data point.

Introduction How Does LOF Work? An Alternative to LOF The LOF Algorithm • reach - dist k ( p , o ) = max { k -distance( o ) , dist ( o , p ) } • kNN ( p ) is in practice the set k nearest neighbours. � − 1 �� o ∈ kNN ( p ) reach - dist k ( p , o ) • lrd k ( p ) = | kNN ( p ) | lrd k ( o ) � o ∈ kNN ( p ) lrd k ( p ) • LOF k ( p ) = | kNN ( p ) |

Introduction How Does LOF Work? An Alternative to LOF How Does LOF Work? • A LOF-value around (way above) one is considered to be an inlier (outlier). • In the figure retrieved from M. Breunig, et al. all data points of the clusters C 1 and C 2 are inliers while the data points o 1 and o 2 have a value clearly more than one. • However, the choice for the number of nearest neighbours k remains ambiguous. • M. Breunig, et al. provide some heuristics on the minimum and maximum values of k , but this remains vague and additional information on the data-generating process is required. • Another issue is that, even is k chosen appropriately, some clusters are not properly identified. Or what about outlying clusters? • Finally, how do we deal with categorical values?

Introduction How Does LOF Work? An Alternative to LOF An Alternative to LOF

Introduction How Does LOF Work? An Alternative to LOF LOCI • To deal with the arbitrary choice of number of nearest neighbours k the Local Outlier Correlation Integral (LOCI) method is introduced. • This approach resembles the LOF-method. • Differences arise as the neighbourhood is much more continuous, instead of discrete and rather arbitrary. • Although some parameters need to be chosen beforehand, k is automatically dealt with.

Introduction How Does LOF Work? An Alternative to LOF LOCI • Questions to be answered for LOCI: • Chebyshevs’ inequality P [ | X − µ | ≥ k σ ] ≤ 1 k > 1 k 2 , is used for a random variable X with expected value µ and standard deviation σ . But the method uses the sample standard deviation while Chebyshevs’ inequality uses the population standard deviation. And there are more efficient alternatives available, such as the upper probability bound provided by Saw et al. (1984). • What is influence of the parameters α and k ? And why are they set at α = 0 . 5 and k = 3? • Is 20 as chosen in the paper the appropriate minimum number of neighbours to start with? Is it much affected by the choice of the population probability function? • Example outliers in the paper are hard to reproduce.

Outlier Detection Methods Paul van Leeuwen 5 December 2019 - PowerPoint PPT Presentation

Introduction How Does LOF Work? An Alternative to LOF Outlier Detection Methods Paul van Leeuwen 5 December 2019 Introduction How Does LOF Work? An Alternative to LOF Introduction How Does LOF Work? An Alternative to LOF Introduction

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Outlier Detection Outlier detection is both easy and difficult. It is easy since there are

Proximity-based Outlier Detection Objects far away from the others are outliers The

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models Chan-Su Lee and Ahmed

Outlier Detection Chapter 12 of Data Mining: Concepts and Techniques JIAWEI HAN, MICHELINE KAMBER,

Good and Bad Neighborhood Approximations for Outlier Detection Ensembles Evelyn Kirner, Erich

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Background Data Resampling for Outlier-Aware Classification Out-of-distribution Detection

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles Yue Zhao Maciej K.

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic

Outlier Detection for Temporal DATA: A Survey Manish Gupta et al. presented by Seoul

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven

A three-level M-quantile model for poverty mapping in Poland Maciej Bersewicz 1 , Stefano

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

Single dimensional optimization Importance sampling Biostatistics 615/815 Lecture 16: . .

Estimation of Demographic Parameters for New Zealand Sea Lions Breeding on the Auckland Islands

Making and Evaluating Point Forecasts Tilmann Gneiting Universit at Heidelberg Eltville, June

Adaptive Designs Mark van der Laan Division of Biostatistics, UC Berkeley September 28 , 2018

MCMC for Continuous-Time Discrete-State Systems Vinayak Rao and Yee Whye Teh Gatsby