OED for KRR, and the MVCE Gaussian Processes for Active Sensor Management Alexander N. Dolia , University of Southampton This poster is based on •A.N.Dolia, C.J.Harris, J.Shawe-Taylor, D.M.Titterington, Kernel Ellipsoidal Trimming , submitted to the Special Issue of the Journal Computational Statistics and Data Analysis on Machine Learning and Robust Data Mining. under review. •A.N.Dolia, T.De Bie, C.J.Harris, J.Shawe-Taylor, D.M.Titterington. Optimal experimental design for kernel ridge regression, and the minimum volume covering ellipsoid , Workshop on Optimal Experimental Design, Southampton, 22-26 September, 2006 Joint work with: Dr. Tijl De Bie , Katholieke Universiteit Leuven Prof. John Shawe-Taylor , University of Southampton Prof. Chris Harris , University of Southampton Prof. Mike Titterington , University of Glasgow
OED for KRR, and the MVCE Problem Statement Aim is to estimate locations of the senso rs and number of repetitions given a set of possible sensors locations, cost of measurements and upper bound for the number of repetitions at given sensor locations in order to get good prediction f ( x ) • Sensor network: N sensors measure signals at positions x i • Sensors measure function y i = f ( x i ) = x 0 i w + n i • Weight vector w gives information about ‘system’ x 1 • Position sensors optimally at X D x 3 x 2 • Estimate w based on X D
OED for KRR, and the MVCE Optimal experiment design? y i x 0 i w x i Optimal experiment design (OED) idea: • Given a set of n data points X = { x i } • Choose multiset X D = { x D,i } ⊆ X with N data points, N i times x i • Measure at x D,i → y D = { y D,i } with y D,i = x 0 D,i w + n i • Estimate w based on { X D , y D } → ˆ w
OED for KRR, and the MVCE Optimal experiment design for RR • Result is thus a non-convex optimization problem: ⎛ − 1 ⎞ ⎛ ⎞ X ⎝ X i + γ I + 1 ⎜ ⎟ 4 γ 2 α i x i x 0 α i x i x 0 ⎠ min α − logdet ⎝ ⎠ i i i α 0 e = 1 s.t. α ≥ 0 • Minimize tight upper bound: ⎛ ⎞ ⎝ X α ∗ α i x i x 0 ⎠ γ = argmin α − logdet i + γ I i α 0 e = 1 s.t. α ≥ 0 • This is a convex optimization problem again
OED for KRR, and the MVCE Regularized MVCE • What about the dual of the regularized D-OED? min M , μ logdet( M ) + μ + γ trace( M − 1 ) i M − 1 x i < = μ x 0 s.t. • The optimum is given by: X M ∗ α ∗ γ ,i x i x 0 γ = i + γ I i where α ∗ γ is the solution of the regularized D-OED problem P i 1 • Interpretation: trace( M − 1 ) = λ i → fit an ellipsoid, but make sure none of the eigenvalues of M ∗ γ is too small.. .
OED for KRR, and the MVCE Kernel ridge regression (KRR) • Kernel ridge regression (KRR): K D = X D X 0 D γ I ) − 1 y β = ( K D + e Least squares X Ridge regression w RR = X 0 ˆ D β = β i x D,i Kernel RR i X X f ( x ) = x 0 ˆ β i x 0 x D,i = w RR = β i k ( x , x D,i ) i i • Everything expressed in terms of K D (i.e. in terms of inner products/kernels): ‘kernel trick’ • If we want to do OED for KRR, we need to write it entirely in terms of kernel evaluations/innerproducts—can we?
OED for KRR, and the MVCE Kernel MVCE • Mahalanobis distances x 0 ( P i + γ I ) − 1 x in terms of in- i α ∗ γ ,i x i x 0 ner products/kernel evaluations? • Let AKA = V Λ V 0 (eigenvaluedecomposition), then (deriva- tion not shown. . . ): ³ ´ X 1 i + γ I ) − 1 x x 0 x − x 0 X 0 AV Λ ( Λ + γ I ) − 1 V 0 AXx x 0 ( α ∗ γ ,i x i x 0 = γ i Novelty detection MVCE and duality • Express in terms of k ( x , x ) = x 0 x and k = Xx , then: Regularized MVCE ³ ´ Kernel MVCE X 1 x 0 ( α ∗ γ ,i x i x 0 i + γ I ) − 1 x k ( x , x ) − k 0 AV Λ ( Λ + γ I ) − 1 V 0 Ak = γ i completely expressed in terms of kernels
OED for KRR, and the MVCE OED: summary D-OED MVCE ⎛ ⎞ ⎝ X standard α i x i x 0 ⎠ min α − logdet min M , μ logdet ( M ) + μ i i i M − 1 x i < = μ x 0 s.t. α 0 1 = 1 s.t. α ≥ 0 ⎛ ⎞ regularized ⎝ X logdet( M ) + μ + γ trace( M − 1 ) α i x i x 0 min M , μ ⎠ min α − logdet i + γ I x 0 i M − 1 x i < = μ i s.t. α 0 1 = 1 s.t. α ≥ 0 min a − logdet( AKA + γ I ) kernel a 0 a < = 1 s.t. a ≥ 0
OED for KRR, and the MVCE Experiment
OED for KRR, and the MVCE Generalised D-optimal Experimental Design
OED for KRR, and the MVCE Conclusions • Two seemingly very different algorithms within one optimiza- tion framework • A way to perform optimal experimental design in high dimen- sional spaces, such as kernel induced feature spaces • A way to perform minimum volume covering ellipsoidestima- tion in high dimensional spaces to perform novelt y detection • Nice features: Convex optimisation and sparse solution
Recommend
More recommend