Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Michael Geilke, Andreas Karwath, and Stefan Kramer Johannes Gutenberg University Mainz, Germany September 22, 2016
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 2
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 3
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Smart 4
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions 1000 sensors 5 measurements per second 5 years Smart more than 2 billion measurements about 2 GBs of data 5
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions energy supplier 1 million households about 2 PBs of data constant update of patterns 6
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions EDDO f Smart 7
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions EDDO Inference f F Smart 8
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions F Query1 Query2 Query3 Smart Knowledge Query: Return the probability distribution for sensors in the living room during the week days. 9
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑔(𝑌 1 , … , 𝑌 𝑜 ) EDDO Inference f F 10
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F 11
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F 12
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Weaknesses of EDDO 𝑜 f(𝑌 1 ) ∙ 𝑔 𝑌 𝑗 𝑌 1 , … , 𝑌 𝑗−1 EDDO Inference 𝑗=2 f F only for discrete random variables 13
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Goals A density estimator that estimates joint densities from data streams is able to deal with heterogeneous data, and and works for higher dimensional data. For density estimation, 100 variables is high dimensional. 14
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Main Idea 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 15
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Main Idea 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 16
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance = ∈ ℝ 𝑜 𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 20, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 50, … 17
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance ℝ 𝑜 ∋ 𝑦 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) ∈ ℝ 𝑛 = (𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 ) 𝑤 18
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝐽 = 𝑦 = 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) (𝑦 = 𝑤 𝑔 ) ∈𝐽 𝑦 ℝ 𝑜 ℝ 𝑛 19
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance = ∈ ℝ 𝑜 𝑢𝑓𝑛𝑞𝑓𝑠𝑏𝑢𝑣𝑠𝑓 = 10, ℎ𝑣𝑛𝑗𝑒𝑗𝑢𝑧 = 80, … 20
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝑦 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 21
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑈 Σ −1 𝑦 𝑦 − 𝑤 − 𝑤 Mahalanobis distance: 22
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 23
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 24
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark representative instance 𝑛 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) f(𝑊 1 ) ∙ 𝑔 𝑊 𝑗 𝑊 1 , … , 𝑊 𝑗−1 𝑗=2 25
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 26
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 (𝑦 = 𝑤 𝑔 ) 𝑦 ∈𝐽 27
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Choice of Landmarks Main idea: theoretical foundation landmarks are orthogonal to each other if 𝑀 = d + 1, then consistent estimator back translation by system of linear equations 28
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Parameter Setting Datasets Parameters: Synthetic 𝜄 𝐷→𝑆 = 100 Gaussian mixtures Euclidean norm Real-World 𝑀 ∈ 2, 3, 5, 10, 20 Covertype 𝑁 ∈ 0.1, 0.5, 1.0, 2.0, 5.0, 10.0 Electricity Letter Shuttle 29
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: 𝑀 30
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Mahalanobis (1 Gaussian) 31
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Mahalanobis (10 Gaussians) 32
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Parameter Setting 𝑀 depends on dimensionality of data small 𝑁 partition the space better but at some point too few instances per region 33
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Evaluation: Performance Datasets oKDE: Synthetic online Kernel Density Estimator Gaussian mixtures for multi-variate densities Real-World for continuous variables Covertype by Kristan et al. (2011) Electricity Letter Shuttle 34
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions electricity (9 attributes) shuttle (11 attributes) letter (17 attributes) covertype (54 attributes)
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Conclusions online density estimation in higher dimensions heterogeneous data stream theoretical foundation comparable to the state of the art Future Work: new strategies for landmarks selection outlier detection detection of emerging trends 36
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Thank you for your attention 37
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) 38
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) ℎ 𝑀 𝑦 = ℎ 𝑀 (𝑧 ) but 𝑦 ≠ 𝑧 𝑌 1 × 𝑌 2 × … × 𝑌 𝑜 ∋ 𝑦 ℎ 𝑀 ℝ 𝑜 ∋ 𝑦 ∈ ℝ 𝑛 𝑤 𝑔 𝑞 −∞ 𝑑𝑝𝑠𝑠 𝑘 𝑦 𝑘 𝑤 1 , … , 𝑤 𝑞 𝑒𝑦 𝑗+1 𝑒𝑦 𝑗+2 … 𝑒 𝑜 −∞ 39 𝑘=𝑗
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions Online Density Estimation using Representatives (RED) landmark instance ∈ ℝ 𝑜 ℎ 𝑀 𝑦 𝐽 = 𝑦 = 𝑤 = (𝑤 1 , 𝑤 2 , 𝑤 3 , 𝑤 4 ) 40
Recommend
More recommend