Artificial Intelligence Group Gaussian Model Trees for Traffic Imputation Sebastian Buschjäger, Thomas Liebig and Katharina Morik TU Dortmund University - Artifical Intelligence Group February 20, 2019 1 / 17
Artificial Intelligence Group Motivation: Smart Cities 2 / 17
Artificial Intelligence Group Motivation: Smart Cities Idea Distribute small devices across the entire city to monitor specific locations 3 / 17
Artificial Intelligence Group Motivation: Smart Cities Idea Distribute small devices across the entire city to monitor specific locations Design requirements 1. Sensing devices should be as small and as energy efficient as possible to minimize costs 2. Sensing devices should be low-priced to minimize initial investment costs 3. Data should not be processed globally to minimize communication and maximize privacy 4. Prediction models should be small, but accurate enough to be used on the sensing devices 5. The system should report possible sensor locations with respect to its accuracy. 3 / 17
Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values 4 / 17
Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with x , D) K (D) − 1 , � f (� x ) = � K (� y � 4 / 17
Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with Kernel vector [ k ( x , x 1 ) , . . . , k ( x , x N )] T Target vector [ y 1 , . . . , y N ] N x , D) K (D) − 1 , � f (� x ) = � K (� y � Kernel matrix including noise [ k ( x i , x j )] i , j + σ n I 4 / 17
Artificial Intelligence Group Traffic Imputation Our focus here Count the number of vehicles at a given coordinate (latitude / longitude) Formally Imputation problem, where we impute missing sensor values Popular method Gaussian Processes p ( y |D , � x ) ∼ N( f (� x ) , ·) with Kernel vector [ k ( x , x 1 ) , . . . , k ( x , x N )] T Target vector [ y 1 , . . . , y N ] N x , D) K (D) − 1 , � f (� x ) = � K (� y � Kernel matrix including noise [ k ( x i , x j )] i , j + σ n I Challenges ◮ GPs do not scale well, due to matrix inversion (runtime O( N 3 ) ) ◮ GPs do not have a traffic-flow model, e.g. by using map data 4 / 17
Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 5 / 17
Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 Expert weight Small GP with samples D k ⊂ D 5 / 17
Artificial Intelligence Group State of the art GPs Scaleable GPs Well-studied problem with solutions utilizing subset of data points, sparse kernels, sparse approximation, implicit and explicit block structures, . . . Important for us Each local sensing device should execute one small expert model Deisenroth 2015 Distributed Gaussian Processes (DGP) Idea Factorize global likelihood into product of m individual likelihoods m � p ( y |D) ≈ β k p k ( y |D k ) k = 1 Expert weight Small GP with samples D k ⊂ D Nice Problematic + p k ( y |D k ) are independent from − All experts need to be evaluated each other to compute p ( y |D) + D k can potentially be small − D k is randomly sampled 5 / 17
Artificial Intelligence Group Gaussian Model Trees: Key questions So far DGPs offer small expert models, which only require communication of local predictions But 1 Is there a better way to sample D k ? But 2 Can we get away without any communication at all? 6 / 17
Artificial Intelligence Group GP induction as loss minimization problem 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� 7 / 17
Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model 7 / 17
Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B 7 / 17
Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B Then we can split the optimization problem into two problems 1 1 x ) � 2 + 2 || f A || 2 � arg min � y − f A (� H + 2 σ 2 f A ∈H , f B ∈H n x , y )∈A (� 1 1 x ) � 2 2 || f B || 2 � � y − f B (� H + 2 σ 2 n x , y )∈B (� 7 / 17
Artificial Intelligence Group GP induction as loss minimization problem Noise assumption from GP 1 1 x ) � 2 2 || f || 2 arg min � � y i − f (� H + 2 σ 2 f ∈H n x , y )∈D (� Regularization: Norm of f in RKHS H MSE of GP model Goal Decompose optimization problem into two independent problems. ◮ Let A ⊆ D denote a set of c inducing points. Let B = D \ A ◮ Assume k (� x i , � x j ) ≈ 0 for � x i ∈ A and � x j ∈ B Then we can split the optimization problem into two problems 1 1 x ) � 2 + 2 || f A || 2 � arg min � y − f A (� H + 2 σ 2 f A ∈H , f B ∈H x , A) K (A) − 1 , � f (� x ) = � K (� y � n x , y )∈A (� 1 1 x ) � 2 2 || f B || 2 � � y − f B (� x , B) K (B) − 1 , � f (� x ) = � K (� y � H + 2 σ 2 n x , y )∈B (� 7 / 17
Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? 8 / 17
Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? 8 / 17
Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? x k x i x j x j 8 / 17
Artificial Intelligence Group Subset selection (1) Question How to find sets A and B ? x i x j Observation If kernel is stationary, then k (� x i , � x j ) ≈ 0 ⇒ k (� x i , � x k ) ≈ 0 for k (� x j , � x k ) ≈ 1. Thus Points � x j and � x k that are similar to each other, will have similar dissimilarity with � x i 8 / 17
Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other 9 / 17
Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 9 / 17
Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 More formally 1 arg max 2 log det ( I + aK (A)) A⊂D , |A | = c 9 / 17
Artificial Intelligence Group Subset selection (2) Thus It is enough to store a reference point for each set A and B . Conclusion We need to find reference points which are maximally dissimilar to each other Idea Formulate another maximization problem 1 � k 11 k 12 = 1 � 2 log det 2 log ( k 11 · k 22 − k 12 · k 21 ) → max if k 12 = k 21 ≈ 0 k 21 k 22 More formally 1 arg max 2 log det ( I + aK (A)) A⊂D , |A | = c Still This is a very difficult problem, since we need to check all possible subsets of A ⊂ D Lawrence 2003 1 2 log det (I + aK (A)) is sub-modular 9 / 17
Recommend
More recommend