faster algorithms for the constrained k means problem
play

Faster Algorithms for the Constrained k -means Problem Ragesh - PowerPoint PPT Presentation

Faster Algorithms for the Constrained k -means Problem Ragesh Jaiswal CSE, IIT Delhi June 16, 2015 [Joint work with Anup Bhattacharya (IITD) and Amit Kumar (IITD)] Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem k -means


  1. Faster Algorithms for the Constrained k -means Problem Ragesh Jaiswal CSE, IIT Delhi June 16, 2015 [Joint work with Anup Bhattacharya (IITD) and Amit Kumar (IITD)] Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  2. k -means Clustering Problem Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k points C ⊂ R d (called centers ) such that the sum of squared Euclidean distance of each point in X to its closest center in C is minimized. That is, the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Example: k = 4 , d = 2 Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  3. k -means Lower/Upper Bounds Lower bounds: The problem is NP-hard when k ≥ 2 , d ≥ 2 [Das08, MNV12, Vat09]. Theorem [ACKS15]: There is a constant ǫ > 0 such that it is NP-hard to approximate the k -means problem to a factor better than (1 + ǫ ). Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  4. k -means Lower/Upper Bounds Lower bounds: The problem is NP-hard when k ≥ 2 , d ≥ 2 [Das08, MNV12, Vat09]. Theorem [ACKS15]: There is a constant ǫ > 0 such that it is NP-hard to approximate the k -means problem to a factor better than (1 + ǫ ). Upper bounds: There are various approximation algorithms for the k -means problem. Citation Approx. factor Running Time [AV07] O (log k ) polynomial time [KMN + 02] 9 + ǫ polynomial time � nd · 2 ˜ O ( k /ǫ ) � [KSS10, JKY15, FMS07] (1 + ǫ ) O Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  5. k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  6. k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  7. k -means Locality property Clustering using the k -means formulation implicitly assumes that the target clustering follows locality property that data points within the same cluster are close to each other in some geometric sense. There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. A unified framework that considers all the above problems would be nice. Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  8. k -means Locality property There are clustering problems arising in Machine Learning where locality is not the only requirement while clustering. r-gather clustering : Each cluster should contain at least r points. Capacitated clustering : Cluster size is upper bounded. l-diversity clustering : Each input point has an associated color and each cluster should not have more that 1 l fraction of its points sharing the same color. Chromatic clustering : Each input point has an associated color and points with same color should be in different clusters. A unified framework that considers all the above problems would be nice. Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  9. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  10. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  11. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  12. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k clusters X 1 , ..., X k such that the the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Φ( X ) = . | X i | i =1 x ∈ X i Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Fact For any X ⊂ R d and any point p ∈ R d , x ∈ X || x − p || 2 = � x ∈ X || x − Γ( X ) || 2 + | X | · || Γ( X ) − p || 2 . � Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  13. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k clusters X 1 , ..., X k such that (i) the clusters satisfy D and (ii) the following cost function is minimized: k � x ∈ X i x � � || x − Γ( X i ) || 2 , where Γ( X i ) = Ψ( X ) = . | X i | i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  14. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � � Φ C ( X ) = min c ∈ C x ∈ X Problem (Attempted formulation in terms of centers) Given n points X ⊂ R d , an integer k, and a set of constraints D , find k centers C ⊂ R d such that... Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

  15. Constrained k -means Problem ( k -means) Given n points X ⊂ R d , and an integer k, find k centers C ⊂ R d such that the the following cost function is minimized: � || x − c || 2 � Φ C ( X ) = min � c ∈ C x ∈ X Problem (Constrained k -means [DX15]) Given n points X ⊂ R d , an integer k, a set of constraints D , and a partition algorithm A D , find k centers C ⊂ R d such that the following cost function is minimized: k � � || x − Γ( X i ) || 2 , where ( X 1 , ..., X k ) ← A D ( C , X ) . Ψ( X ) = i =1 x ∈ X i Partition Algorithm [DX15] Given a dataset X , constraints D , and centers C = ( c 1 , ..., c k ), the partition algorithm A D ( C , X ) outputs a clustering ( X 1 , ..., X k ) of X such that (i) all clusters X i satisfy D and (ii) the following cost function is minimized: k � � || x − c i || 2 . cost ( A D ( C , X )) = i =1 x ∈ X i Ragesh Jaiswal Faster Algorithms for the Constrained k -means Problem

Recommend


More recommend