Detecting Outliers under Detecting Outliers . . . What We Plan To - PowerPoint PPT Presentation

Outlier Detection Is . . . Outlier Detection . . . Which Approach Is . . . Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty: Algorithm Number of . . . A New Algorithm Based on Justification of the . . . Acknowledgments Constraint Satisfaction Title Page Evgeny Dantsin and Alexander Wolpert ◭◭ ◮◮ Department of Computer Science, Roosevelt University ◭ ◮ Chicago, IL 60605, USA, { edantsin,awolpert } @roosevelt.edu Page 1 of 10 Martine Ceberio, Gang Xiang, and Vladik Kreinovich Department of Computer Science, University of Texas at El Paso Go Back El Paso, TX 79968, USA, { mceberio,vladik } @cs.utep.edu Full Screen Close Quit

1. Outlier Detection Is Important Outlier Detection Is . . . Outlier Detection . . . • In many application areas, it is important to detect outliers , i.e., Which Approach Is . . . unusual, abnormal values. Detecting Outliers . . . • In medicine: outliers may mean disease. What We Plan To Do Algorithm • In geophysics: outlier may mean a mineral deposit. Number of . . . • In structural integrity testing: outlier may mean a structural fault. Justification of the . . . Acknowledgments • Traditional engineering approach to outlier detection: – collect measurement results x 1 , . . . , x n corresponding to nor- Title Page mal situations; ◭◭ ◮◮ n � √ = 1 def def = M − E 2 – compute E n · x i and σ = V , where V ◭ ◮ i =1 � n = 1 Page 2 of 10 def x 2 and M n · i ; i =1 Go Back – a value x is classified as an outlier if it is outside the interval def def Full Screen [ L, U ], where L = E − k 0 · σ , U = E + k 0 · σ , and k 0 > 1 is pre-selected (most frequently, k 0 = 2, 3, or 6). Close Quit

2. Outlier Detection Under Interval Uncertainty Outlier Detection Is . . . Outlier Detection . . . • In practice: often, we only have intervals x i = [ x i , x i ] of possible Which Approach Is . . . values of x i . Detecting Outliers . . . • Example: the value � x i measured by an instrument with a known What We Plan To Do upper bound ∆ i on the measurement error means that Algorithm Number of . . . x i ∈ [ � x i − ∆ i , � x i + ∆ i ] . Justification of the . . . Acknowledgments • Problem: for different values x i ∈ x i , we get different L and U . • Objective: given x i and k 0 , compute Title Page def ◭◭ ◮◮ L = [ L, L ] = { L ( x 1 , . . . , x n ) : x 1 ∈ x 1 , . . . , x n ∈ x n } ; ◭ ◮ def U = [ U, U ] = { U ( x 1 , . . . , x n ) : x 1 ∈ x 1 , . . . , x n ∈ x n } . Page 3 of 10 • A value x is a possible outlier if it is outside one of the possible k 0 -sigma intervals [ L, U ], i.e., if x �∈ [ L, U ]. Go Back • A value x is a guaranteed outlier if it is outside all possible k 0 - Full Screen sigma intervals [ L, U ], i.e., if , i.e., if x �∈ [ L, U ]. Close Quit

3. Which Approach Is More Reasonable? Outlier Detection Is . . . Outlier Detection . . . • Situation: our main objective is not to miss an outlier. Which Approach Is . . . Detecting Outliers . . . – Example: structural integrity tests. What We Plan To Do – Clarification: we do not want to risk launching a spaceship Algorithm with a faulty part. Number of . . . – Reasonable approach: look for possible outliers. Justification of the . . . • Situation: make sure that the value x is an outlier. Acknowledgments – Example: planning a surgery. Title Page – Clarification: we want to make sure that there is a micro- ◭◭ ◮◮ calcification before we start cutting the patient. – Reasonable approach: look for guaranteed outliers. ◭ ◮ Page 4 of 10 Go Back Full Screen Close Quit

4. Detecting Outliers Under Interval Uncertainty: What Outlier Detection Is . . . Is Known Outlier Detection . . . Which Approach Is . . . • Case of possible outliers: there exist efficient algorithms for com- Detecting Outliers . . . puting L and U . What We Plan To Do Algorithm • Case of guaranteed outliers: the computation of L and U is, in Number of . . . general, NP-hard. Justification of the . . . • Technical result: if 1 + (1 /k 0 ) 2 < n (e.g., if k 0 > 1 and n ≥ 2), Acknowledgments then the maximum U of U (and the minimum L of L ) is always attained at a combination of endpoints of x i . Title Page • Resulting algorithm: compute U and L by trying all 2 n combina- ◭◭ ◮◮ tions of x i and x i . ◭ ◮ def • Specific case: when all measured values � x i = ( x i + x i ) / 2 are defi- nitely different from each other, in the sense that the “narrowed” Page 5 of 10 intervals do not intersect � � Go Back x i − 1 + α 2 x i + 1 + α 2 � · ∆ i , � · ∆ i , n n Full Screen def where α = 1 /k 0 and ∆ i = ( x i − x i ) / 2 is the interval’s half-width. Close • Good news: in this case, we can compute U and L in feasible time. Quit

5. What We Plan To Do Outlier Detection Is . . . Outlier Detection . . . • More general case: no two narrowed intervals are proper subsets Which Approach Is . . . of one another. Detecting Outliers . . . • In precise terms: one of them is not a subset of the interior of the What We Plan To Do other. Algorithm Number of . . . • Objective: extend known efficient algorithms to this case. Justification of the . . . • Since L ( x i ) = − U ( − x i ), it suffices to be able to compute U . Acknowledgments • Main idea: reduce the interval computation problem to the con- Title Page straint satisfaction problem with the following constraints: ◭◭ ◮◮ – for every i , if in the maximizing assignment we have x i = x i , then replacing this value with x i = x i will either decrease U ◭ ◮ or leave U unchanged; Page 6 of 10 – for every i , if in the maximizing assignment we have x i = x i , then replacing this value with x i = x i will either decrease U Go Back or leave U unchanged; Full Screen – for every i and j , replacing both x i and x j with the oppo- site ends of the corresponding intervals x i and x j will either Close decrease U or leave U unchanged. Quit

6. Algorithm Outlier Detection Is . . . Outlier Detection . . . • General idea: Which Approach Is . . . – First, we sort of the values � x i into an increasing sequence. Detecting Outliers . . . What We Plan To Do – Without losing generality, we can assume that Algorithm � x 1 ≤ � x 2 ≤ . . . ≤ � x n . Number of . . . – Then, for every k from 0 to n , we compute the value V ( k ) = Justification of the . . . M ( k ) − ( E ( k ) ) 2 of the population variance V for the vec- Acknowledgments tor x ( k ) = ( x 1 , . . . , x k , x k +1 , . . . , x n ), and we compute U ( k ) = √ E ( k ) + k 0 · V ( k ) . Title Page – Finally, we compute U as the largest of n +1 values U (0) , . . . , U ( n ) . ◭◭ ◮◮ • Details: how to compute the values V ( k ) ◭ ◮ – First, we explicitly compute M (0) , E (0) , and Page 7 of 10 V (0) = M (0) − ( E (0) ) 2 . – Once we know the values M ( k ) and E ( k ) , we can compute Go Back M ( k +1) = M ( k ) + 1 n · ( x k +1 ) 2 − 1 Full Screen n · ( x k +1 ) 2 Close and E ( k +1) = E ( k ) + 1 n · x k +1 − 1 n · x k +1 . Quit

7. Number of Computation Steps Outlier Detection Is . . . Outlier Detection . . . • Sorting: requires O ( n · log( n )) steps. Which Approach Is . . . • Computing the initial values M (0) , E (0) , and V (0) requires linear Detecting Outliers . . . time O ( n ). What We Plan To Do Algorithm • For each k from 0 to n − 1, we need a constant number of steps Number of . . . to compute the next values M ( k +1) , E ( k +1) , and V ( k +1) as Justification of the . . . M ( k +1) = M ( k ) + 1 n · ( x k +1 ) 2 − 1 Acknowledgments n · ( x k +1 ) 2 and E ( k +1) = E ( k ) + 1 n · x k +1 − 1 Title Page n · x k +1 . ◭◭ ◮◮ √ • Computing U ( k ) = E ( k ) + k 0 · V ( k ) also requires a constant number ◭ ◮ of steps. • Finally, finding the largest of n +1 values U ( k ) requires O ( n ) steps. Page 8 of 10 Go Back • Overall: we need Full Screen O ( n · log( n )) + O ( n ) + O ( n ) + O ( n ) = O ( n · log( n )) steps . Close • Comment: if the measurement results � x i are already sorted, then we only need linear time to compute U . Quit

8. Justification of the Algorithm Outlier Detection Is . . . Outlier Detection . . . • Known: U = max U is attained at a vector x = ( x 1 , . . . , x n ) in Which Approach Is . . . which each value x i is equal either to x i or to x i . Detecting Outliers . . . • New result: this maximum is attained at one of the vectors x ( k ) What We Plan To Do in which all the lower bounds x i precede all the upper bounds x i . Algorithm Number of . . . • How we prove it: by reduction to a contradiction. Justification of the . . . • Assume: the maximum is attained at a vector x in which one of Acknowledgments the lower bounds follows one of the upper bounds. • Notation: let i be the largest upper bound index followed by the Title Page lower bound. ◭◭ ◮◮ • Conclusion: in x opt , we have x i = x i and x i +1 = x i +1 . ◭ ◮ • Following proof: since maximum is attained at x , each replacing: Page 9 of 10 – replacing x i with x i ; – replacing x i +1 with x i +1 ; and Go Back – replacing both Full Screen leads to ∆ U ≤ 0; we trace these changes ∆ U . Close • We then conclude that one of the narrowed intervals is a proper subset of another – contradiction to our assumption. Quit

Detecting Outliers under Detecting Outliers . . . What We Plan To - PowerPoint PPT Presentation

Outlier Detection Is . . . Outlier Detection . . . Which Approach Is . . . Detecting Outliers under Detecting Outliers . . . What We Plan To Do Interval Uncertainty: Algorithm Number of . . . A New Algorithm Based on Justification of the .

Detecting Outliers with Ensemble of Profile HMMs Xilin Yu 1 UIUC December 11, 2018 1 under the

Detecting Outliers in HMM modeling through Relative Entropy with Applications to Change-Point

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Correspondence Analysis and Moderate Outliers Anna Langovaya, Sonja Kuhnt TU Dortmund Ferbruar

Detecting Cracks under Bushings Detecting Cracks under Bushings in Aircraft Structures in

Detecting multivariate outliers using projection pursuit with particle swarm optimization Anne

Conference on Seasonality, Seasonal Adjustment and their implications for Short-Term Analysis and

Supervised classification and outliers detection in gene expression data Laurent Br eh elin

Visualizing Big Data Outliers through Distributed Aggregation Leland Wilkinson. Proc VAST 2017,

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Detecting Chang Detecting Changes in W s in Water ter Qua Q ualit lity i lit lit i in L

Detecting Self-Interruptions during Reading Jan Pilzer and Sam Liu 2017-11-27 Detecting

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Detecting Insolvency Detecting Insolvency David Emanuel 1 4 August 2 0 0 9 Outline

High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 ,

Course Content Week 12 (May 26) Introduction to Data Mining 33459-01 Principles of Knowledge

Q1 2020 CONFERENCE CALL May 8, 2020 Cautionary Statements This presentation contains

Cycle 1 2018: Broad PCORI Funding Announcements (PFAs) Applicant Town Hall February 1, 2018

Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Regularized Directions of Maximal Outlyingness Michiel Debruyne Dept. of mathematics and computer

Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles Erich Schubert,

MDS Embedding MDS takes as input a distance matrix D , containing all N N pair of distances