Identification of local multivariate outliers Anne Ruiz-Gazen and Christine Thomas-Agnan Gremaq, TSE and IMT Toulouse, France (in collab. with Peter Filzmoser) SSIAB - Avignon - 11/05/12 A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 1 / 24
Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24
Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. F ε = (1 − ε ) F + ε G In the case of continuous attributes, the main bulk of the data set assumed to follow an elliptical distribution (e.g. gaussian) F and the outlying observations following a distribution G (e.g. point mass). A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24
Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. F ε = (1 − ε ) F + ε G In the case of continuous attributes, the main bulk of the data set assumed to follow an elliptical distribution (e.g. gaussian) F and the outlying observations following a distribution G (e.g. point mass). Objective : identify/detect gross errors, atypical observations taking into account the multivariate and the spatial nature of the data. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24
Introduction North�Atlantic�Ocean 0 km 50 100 150 200 Legend E�U�R�O�P E Mine,�in�production Barents�Sea N�O�R W A Y Mine,�closed�down Kirkenes Nikel Important�mineral�occurrence, Zapolyarnij not�developed R�U�S�S�I�A Murmansk Murmansk Smelter,�production of�mineral�concentrate Ivalo City,�town,�settlement F�I�N�L�A�N�D Project�boundary Olenegorsk Monchegorsk Saattopora Kirovsk Kovdor Kittil ä Keivitsa Apatity Pahtavaara E Kandalaksha ' 0 3 24 E o 5 3 o White�Sea C i r c l e A r c t c i � Rovaniemi The Kola project : concentration measures for more than 50 chemical elements in four layers and 617 observations. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 3 / 24
Introduction North�Atlantic�Ocean 0 km 50 100 150 200 Legend E�U�R�O�P E Mine,�in�production Barents�Sea N�O�R W A Y Mine,�closed�down Kirkenes Nikel Important�mineral�occurrence, Zapolyarnij not�developed R�U�S�S�I�A Murmansk Murmansk Smelter,�production of�mineral�concentrate Ivalo City,�town,�settlement F�I�N�L�A�N�D Project�boundary Olenegorsk Monchegorsk Saattopora Kirovsk Kovdor Kittil ä Keivitsa Apatity Pahtavaara E Kandalaksha ' 0 3 24 E o 5 3 o White�Sea C i r c l e A r c t i c � Rovaniemi The Kola project : concentration measures for more than 50 chemical elements in four layers and 617 observations. Data available in the R-package mvoutlier by M. Gschwandtner et P. Filzmoser. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 3 / 24
1 Detection of outliers in a non spatial context Detection of univariate outliers Detection of multivariate outliers 2 Spatial outliers Global and local outliers Identification of univariate spatial outliers 3 Identification of multivariate spatial outliers Variocloud of pairwise Mahalanobis distances Toy example Quantile geographical-variate plot A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 4 / 24
Detection of outliers in a non spatial context Detection of univariate outliers 1 Detection of outliers in a non spatial context Detection of univariate outliers Detection of multivariate outliers 2 Spatial outliers Global and local outliers Identification of univariate spatial outliers 3 Identification of multivariate spatial outliers Variocloud of pairwise Mahalanobis distances Toy example Quantile geographical-variate plot A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 5 / 24
Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24
Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24
Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). Problem of masking effect : outliers may spoil the empirical mean and the standard deviation estimators in such a way that outliers are not detected. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24
Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). Problem of masking effect : outliers may spoil the empirical mean and the standard deviation estimators in such a way that outliers are not detected. Robust version : ¯ x and σ x replaced by some robust estimators such as the median and the MAD. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24
Recommend
More recommend