Statistical and Computational Trade-Offs in Kernel K-Means Daniele Calandriello, Lorenzo Rosasco LCSL - IIT/MIT and Universit` a di Genova NeurIPS, December 2018
K-Means Given n points, partition them into k clusters. n � 1 j = 1 ,..., k � x i − c j � 2 � C = min min n [ c 1 ,..., c j ] i = 1 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7
K-Means Given n points, partition them into k clusters. n � 1 j = 1 ,..., k � x i − c j � 2 � C = min min n [ c 1 ,..., c j ] i = 1 Problem: only linear separation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7
Kernel K-Means Given n points, partition them into k clusters. n � � � 1 � 2 � � ϕ ( x i ) − c j C = min min n [ c 1 ,..., c j ] j = 1 ,..., k i = 1 Feature map ϕ ( · ) : R d → R D Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7
Kernel K-Means Given n points, partition them into k clusters. n � � � 1 � 2 � � ϕ ( x i ) − c j C = min min n [ c 1 ,..., c j ] j = 1 ,..., k i = 1 Feature map ϕ ( · ) : R d → R D (e.g., ϕ ([ x , y ]) = [ x , y , x 2 + y 2 ] ) Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 2/7
Computing Kernel K-Means � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7
Computing Kernel K-Means K ( x 3 , x 1 ) � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 K = � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7
Computing Kernel K-Means K ( x 3 , x 1 ) � n 1 � j = 1 ,..., k � ϕ ( x i ) − c j � 2 C = min min n [ c 1 ,..., c j ] i = 1 K = � ϕ ( x i ) − ϕ ( x j ) � 2 = � ϕ ( x i ) � 2 + � ϕ ( x j ) � 2 − 2 ϕ ( x i ) T ϕ ( x j ) � �� � K ( x i , x j ) kernel Space n 2 , Construct K n 2 , Iter. time: n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 3/7
K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation nm 2 nm nmk � ✒ ✒ � � ✒ Space � n 2 , Construct � n 2 , Iter. time: � K m � n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7
K-Means with Uniform Nystr¨ om Embedding � n � � 1 � 2 � � ϕ m ( x i ) − c j C = min min n j = 1 ,..., k [ c 1 ,..., c j ] i = 1 How to choose m for optimal statistical vs computational trade-off? � ϕ m ( x i ) − ϕ m ( x j ) � 2 = � ϕ m ( x i ) � 2 + � ϕ m ( x j ) � 2 − 2 ϕ m ( x i ) T ϕ m ( x j ) � �� � K m ( x i , x j ) Nystr¨ om approximation nm 2 nm nmk � ✒ ✒ � � ✒ Space � n 2 , Construct � n 2 , Iter. time: � K m � n 2 Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 4/7
Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7
Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7
Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error m = √ n is sufficient for k / √ n rate! Previous results require m = n Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7
Main result Let x i ∼ µ and the test error E ( � c j � 2 ] C ) = E x ∼ µ [min j = 1 ,..., k � ϕ ( x ) − � Theorem O ( k / √ n ) E ( � C ) ≤ + O ( k / m ) statistical error computational error m = √ n is sufficient for k / √ n rate! Previous results require m = n Construct K / � Space K m Iter. time n 2 n 2 n 2 Kernel k -means n √ n n √ nk n 2 Nystr¨ om k -means Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 5/7
MNIST-60k: test cost vs embedding size m C ) E ( � √ n m Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 6/7
Recap Improved statistical vs computational trade-off for k -means First computation saving with no loss of statistical accuracy Similar results for k -means++ (efficient) Open question: fast O ( k / n ) rate? ” designed by freepick from Flaticon ” Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 7/7
Recap Improved statistical vs computational trade-off for k -means First computation saving with no loss of statistical accuracy Similar results for k -means++ (efficient) Open question: fast O ( k / n ) rate? ” designed by freepick from Flaticon Taking suggestions at poster #129 ” Statistical and Computational Trade-Offs in Kernel K-Means NeurIPS 2018 - 7/7
Recommend
More recommend