A better k-means++ Algorithm via Local Search Silvio Lattanzi Christian Sohler Google Research Google Research ICML 2019
k-means Find a set of k centers X c ∈ C d 2 ( x, c ) φ ( X, C ) = min x ∈ X Constant approximation algorithms are known. Goal is to design a constant approximation algorithm that is efficient, easy to implement and has good experimental results. A better k-means++ Algorithm via Local Search
<latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> <latexit sha1_base64="Qtv+Xu/NZDL6vUHV4Ro0OEjlzA=">AB8HicdVDLSgMxFM3UV62vqks3wSLUzZDpWNtl0Y07K9iHtEPJpJk2NDMZkoxQhn6FGxeKuPVz3Pk3ptMKnrgwuGce7n3Hj/mTGmEPqzcyura+kZ+s7C1vbO7V9w/aCuRSEJbRHAhuz5WlLOItjTnHZjSXHoc9rxJ5dzv3NPpWIiutXTmHohHkUsYARrI91dl/tcjODkdFAsIdtFbrWOYEbcWiUjteq5Cx0bZSiBJZqD4nt/KEgS0kgTjpXqOSjWXoqlZoTWaGfKBpjMsEj2jM0wiFVXpodPIMnRhnCQEhTkYaZ+n0ixaFS09A3nSHWY/Xbm4t/eb1EB3UvZVGcaBqRxaIg4VALOP8eDpmkRPOpIZhIZm6FZIwlJtpkVDAhfH0K/yftiu0g27k5KzUulnHkwRE4BmXgBpogCvQBC1AQAgewBN4tqT1aL1Yr4vWnLWcOQ/YL19AvQWj9o=</latexit> k-means++ seeding Elegant and simple algorithm Experimentally gives good results when combined with Lloyd’s algorithm. The solution is a approximation in expectation. O (log k ) David Arthur, Sergei Vassilvitskii: k-means++: the advantages of careful seeding. SODA 2007: 1027-1035 A better k-means++ Algorithm via Local Search
Local search Elegant and simple algorithm It returns a constant approximation and nice experimental results. The algorithm is a bit slow. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu: A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2-3): 89-112 (2004) A better k-means++ Algorithm via Local Search
Combining the two algorithms Elegant and simple algorithm It returns a constant approximation, it is slightly slower than k-means++ and has better experimental results. A better k-means++ Algorithm via Local Search
<latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> <latexit sha1_base64="CuakJu/HyZqru1PUc0cdh0lQwu4=">ACBnicbVBNS8NAEN34WetX1KMIwSLUgyUrgh6LXjxWsB/QhLZbtqlm03YnQgl5OTFv+LFgyJe/Q3e/Ddu2xy09cHA470ZuYFieAaXPfbWlpeWV1bL2UN7e2d3btvf2WjlNFWZPGIladgGgmuGRN4CBYJ1GMRIFg7WB0M/HbD0xpHst7GCfMj8hA8pBTAkbq2UeYCFU8ZkXKkIznGfYdUe5p/hgCKc9u+LW3CmcRYILUkEFGj37y+vHNI2YBCqI1l3sJuBnRAGnguVlL9UsIXREBqxrqCQR0342fSN3TozSd8JYmZLgTNXfExmJtB5HgemMCAz1vDcR/O6KYRXfsZlkgKTdLYoTIUDsTPJxOlzxSiIsSGEKm5udeiQmDzAJFc2IeD5lxdJ67yG3Rq+u6jUr4s4SugQHaMqwugS1dEtaqAmougRPaNX9GY9WS/Wu/Uxa12yipkD9AfW5w/Vw5gS</latexit> Main theoretical result Main idea is to adapt local search analysis to show that in every step with constant probability we reduce the cost of the solution by a ✓ ◆ 1 multiplicative factor 1 − 100 k A better k-means++ Algorithm via Local Search
Experimental results Datasets: - RNA : 8 features from 488565 RNA input sequence pairs (Uzilov et al., 2006) - KDD-BIO : 145751 samples with 74 features measuring the match between a protein and a native sequence (KDD) - KDD-PHY : 100000 samples with 78 features representing a quantum physic task (KDD) A better k-means++ Algorithm via Local Search
Experimental results KDD-BIO RNA KDD-PHY A better k-means++ Algorithm via Local Search
Thanks A better k-means++ Algorithm via Local Search
Recommend
More recommend