with constant multiplicative error
play

with Constant Multiplicative Error Uri Stemmer Ben-Gurion - PowerPoint PPT Presentation

Differentially Private k-Means with Constant Multiplicative Error Uri Stemmer Ben-Gurion University joint work with Haim Kaplan What is -Means Clustering? Given: Data points = , , and parameter


  1. Differentially Private k-Means with Constant Multiplicative Error Uri Stemmer Ben-Gurion University joint work with Haim Kaplan

  2. What is ๐’ -Means Clustering? Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹

  3. What is ๐’ -Means Clustering? Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ โœ“ Probably the most well-studied clustering problem โœ“ Tons of applications โœ“ Super popular

  4. What is ๐’ -Means Clustering? Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ What is Differentially Private ๐’ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ƒผ Every data point ๐’š ๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual

  5. What is ๐’ -Means Clustering? Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ What is Differentially Private ๐’ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ƒผ Every data point ๐’š ๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ƒผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private )

  6. What is ๐’ -Means Clustering? Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Why is that a good privacy definition? Even if an observer knows all other data point but mine, and now she sees the outcome Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ of the computation, then she still cannot learn โ€œanythingโ€ on my data point ๐’‹ What is Differentially Private ๐’ -Means? [Dwork, McSherry, Nissim, Smith 06] (informal) ๏ƒผ Every data point ๐’š ๐’‹ represents the (private) information of one individual ๏ƒผ Goal: the output (the set of centers) does not reveal information that is specific to any single individual ๏ƒผ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point (an algorithm satisfying this requirement is differentially private )

  7. Differentially Private ๐’ -Means Clustering Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point

  8. Differentially Private ๐’ -Means Clustering Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โ€ข Assume ๐’ = ๐’ = ๐Ÿ’ โ€ข OPTโ€™s cost = 0

  9. Differentially Private ๐’ -Means Clustering Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โ€ข Assume ๐’ = ๐’ = ๐Ÿ’ โ€ข OPTโ€™s cost = 0 โ€ข Move one point โ€ข OPTโ€™s cost = 0 ๐šณ

  10. Differentially Private ๐’ -Means Clustering Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โ€ข Assume ๐’ = ๐’ = ๐Ÿ’ โ€ข OPTโ€™s cost = 0 โ€ข Move one point โ€ข OPTโ€™s cost = 0 ๐šณ โ€ข Each solution must remain approx. equally likely โ€ข On at least one of these inputs our cost is โ‰ˆ ๐šณ ๐Ÿ‘

  11. Differentially Private ๐’ -Means Clustering Given: Data points ๐‘ป = ๐’š ๐Ÿ , โ€ฆ , ๐’š ๐’ โˆˆ โ„ ๐’† ๐’ and parameter ๐’ Identify ๐’ centers ๐‘ซ = ๐’— ๐Ÿ , โ€ฆ , ๐’— ๐’ minimizing ๐๐ฉ๐ญ๐ฎ ๐‘ซ = ๐ง๐ฃ๐จ โ„“ ๐’š ๐’‹ โˆ’ ๐’— โ„“ ๐Ÿ‘ ๐’‹ Requirement: the output distribution is insensitive to any arbitrarily change of a single input point Observe: With privacy we must have additive error โ€ข Assume ๐’ = ๐’ = ๐Ÿ’ โ€ข OPTโ€™s cost = 0 โ€ข Move one point โ€ข OPTโ€™s cost = 0 ๐šณ โ€ข Each solution must remain approx. equally likely โ€ข On at least one of these inputs our cost is โ‰ˆ ๐šณ ๐Ÿ‘ โŸน We assume that input points come from the unit ball

  12. Previous and New Bounds Ref Model Runtime Bounds ๐’ ๐Ÿ‘ โ‹… ๐’† ๐’ ๐’† GLMRTโ€™10 ๐ ๐Ÿ โ‹…๐๐๐” + ๐‘ท differential privacy ๐’ ๐ช๐ฉ๐ฆ๐ณ NCBNโ€™16 ๐ ๐ฆ๐ฉ๐ก ๐’ โ‹…๐๐๐” + ๐‘ท differential privacy ๐’ ๐Ÿ’/๐Ÿ‘ โ‹… FXZRโ€™17 ๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐’ ๐ฆ๐ฉ๐ก ๐’ โ‹…๐๐๐” + ๐‘ท ๐’† differential privacy ๐’ ๐Ÿ‘ + ๐’† ๐‘ท ๐ฆ๐ฉ๐ก ๐Ÿ’ ๐’ โ‹…๐๐๐” + ๐‘ท ๐ช๐ฉ๐ฆ๐ณ BDLMZโ€™17 differential privacy ๐’ ๐Ÿ.๐Ÿ”๐Ÿ โ‹… ๐’† ๐Ÿ.๐Ÿ”๐Ÿ NSโ€™18 ๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐’ โ‹…๐๐๐” + ๐‘ท differential privacy ๐’ ๐Ÿ.๐Ÿ๐Ÿ โ‹… ๐’† ๐Ÿ.๐Ÿ”๐Ÿ + ๐’ ๐Ÿ’/๐Ÿ‘ New ๐ช๐ฉ๐ฆ๐ณ ๐‘ท ๐Ÿ โ‹…๐๐๐” + ๐‘ท differential privacy

Recommend


More recommend