Differentially Private k-Means with Constant Multiplicative Error Uri Stemmer Ben-Gurion University joint work with Haim Kaplan
What is ๐ -Means Clustering? Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐
What is ๐ -Means Clustering? Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ โ Probably the most well-studied clustering problem โ Tons of applications โ Super popular
What is ๐ -Means Clustering? Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ What is Differentially Private ๐ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ผ Every data point ๐ ๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual
What is ๐ -Means Clustering? Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ What is Differentially Private ๐ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ผ Every data point ๐ ๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private )
What is ๐ -Means Clustering? Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Why is that a good privacy definition? Even if an observer knows all other data point but mine, and now she sees the outcome Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ of the computation, then she still cannot learn โanythingโ on my data point ๐ What is Differentially Private ๐ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ผ Every data point ๐ ๐ represents the (private) information of one individual ๏ผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private )
Differentially Private ๐ -Means Clustering Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point
Differentially Private ๐ -Means Clustering Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โข Assume ๐ = ๐ = ๐ โข OPTโs cost = 0
Differentially Private ๐ -Means Clustering Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โข Assume ๐ = ๐ = ๐ โข OPTโs cost = 0 โข Move one point โข OPTโs cost = 0 ๐ณ
Differentially Private ๐ -Means Clustering Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โข Assume ๐ = ๐ = ๐ โข OPTโs cost = 0 โข Move one point โข OPTโs cost = 0 ๐ณ โข Each solution must remain approx. equally likely โข On at least one of these inputs our cost is โ ๐ณ ๐
Differentially Private ๐ -Means Clustering Given: Data points ๐ป = ๐ ๐ , โฆ , ๐ ๐ โ โ ๐ ๐ and parameter ๐ Identify ๐ centers ๐ซ = ๐ ๐ , โฆ , ๐ ๐ minimizing ๐๐ฉ๐ญ๐ฎ ๐ซ = ๐ง๐ฃ๐จ โ ๐ ๐ โ ๐ โ ๐ ๐ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โข Assume ๐ = ๐ = ๐ โข OPTโs cost = 0 โข Move one point โข OPTโs cost = 0 ๐ณ โข Each solution must remain approx. equally likely โข On at least one of these inputs our cost is โ ๐ณ ๐ โน We assume that input points come from the unit ball
Previous and New Bounds Ref Model Runtime Bounds ๐ ๐ โ ๐ ๐ ๐ GLMRTโ10 ๐ ๐ โ ๐๐๐ + ๐ท differential privacy ๐ ๐ช๐ฉ๐ฆ๐ณ NCBNโ16 ๐ ๐ฆ๐ฉ๐ก ๐ โ ๐๐๐ + ๐ท differential privacy ๐ ๐/๐ โ FXZRโ17 ๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ ๐ฆ๐ฉ๐ก ๐ โ ๐๐๐ + ๐ท ๐ differential privacy ๐ ๐ + ๐ ๐ท ๐ฆ๐ฉ๐ก ๐ ๐ โ ๐๐๐ + ๐ท ๐ช๐ฉ๐ฆ๐ณ BDLMZโ17 differential privacy ๐ ๐.๐๐ โ ๐ ๐.๐๐ NSโ18 ๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ โ ๐๐๐ + ๐ท differential privacy ๐ ๐.๐๐ โ ๐ ๐.๐๐ + ๐ ๐/๐ New ๐ช๐ฉ๐ฆ๐ณ ๐ท ๐ โ ๐๐๐ + ๐ท differential privacy
Recommend
More recommend