ICML | 2020 Thirty-seventh International Conference on Machine Learning Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints Akbar Rafiey Yuichi Yoshida 1
Core massage • What is the problem? • What do we want to achieve? • What do we achieve in this paper? 2
What is the problem? How to answer to queries while preserving privacy of data? Sensitive data Analyst: wants to do statistical Examples: analysis of data medical data , • web search data, • social networks, • Salary data • Etc, • 3
What do we want to achieve? We need an algorithm such that: • It returns almost a correct answer to a query • It is efficient and fast • Preserves privacy when we have sensitive data. 4
What we achieve in this paper?(part 1) • We consider a class of set function queries, namely submodular set functions • We present an algorithm for submodular maximization and prove: • It is computationally efficient, • Outputs solutions close to an optimal solution • Preserves privacy of dataset 5
What we achieve in this paper?(part 2) • Further, we consider a generalization of submodular functions, namely k-submodular functions. • This allows to capture more problems. • We present an algorithm for k-submodular maximization and prove: • It is computationally efficient, • Outputs solutions close to an optimal solution • Preserves privacy of dataset 6
Differential privacy: A rigorous notion of privacy analysis/ 100 computation Dataset How many people have diabetes ? analysis/ 99 Dataset computation without X’s data Analyst e.g., health Individual insurance company X 7
Differential privacy: A rigorous notion of privacy analysis/ 100 ± 𝜗 computation Dataset How many people add NOISE have diabetes ? analysis/ 100 ± 𝜗 Dataset computation without X’s data Analyst e.g., health insurance company Individual X 8
Differential privacy: A rigorous notion of privacy analysis/ output Dataset computation add NOISE “Difference” at most 𝜗 Dataset analysis/ output without computation X’s data Intuitively, any one individual’s data should NOT significantly change the outcome. 9
Differential Privacy (definition) • For 𝜗, 𝜀 ∈ 𝑆 ! , we say that a randomized computation M is 𝜗, 𝜀 -differentially private if 1. for any neighboring datasets 𝐸 ∼ 𝐸′ , and 2. for any set of outcomes 𝑇 ⊆ range(M), Pr[M(D) ∈ S] ≤ 𝑓 " Pr[M(D ’ ) ∈ S]+ δ Neighboring datasets: two datasets that differ in at most one record . 10
Set function queries m features ! : 2 " → 𝑆 Set function 𝑔 Id gender diabetes …. asthma Class Given dataset 𝐸, function 𝑔 ! (𝑇) • measures “values” of set S in dataset D 1 F 0 …. 1 C1 𝑔 ! {𝑓𝑜𝑒𝑓𝑠, 𝑒𝑗𝑏𝑐𝑓𝑢𝑓𝑡} = 5 • 2 M 1 …. 1 C1 • 𝑔 ! {𝑏𝑡𝑢ℎ𝑛𝑏} = 7 3 F 0 …. 1 C1 4 M 1 …. 0 C1 5 F 0 …. 0 C1 Query: what are k most informative features ? 6 NA 1 …. 0 C1 7 F 0 …. 1 C2 Answer while preserving 8 M 1 …. 1 C2 individual’s privacy? 9 NA 0 ….. 1 C2 10 M 1 …. 1 C2 Dataset 𝐸 11
Submodular Function • In words: the marginal contribution of any element 𝑓 to the value of the function 𝑔(𝑇) diminishes as the input set 𝑇 increases . • Mathematically, a function 𝑔: 2 # → 𝑆 is submodular if • for all 𝐵 ⊆ 𝐶 ⊆ 𝐹 , • and all elements 𝑓 ∈ 𝐹 ∖ 𝐶 we have 𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶) diminishing gain property 12
Problem Design a framework for differentially private submodular maximization under matroid constraint. • A pair 𝑁 = (𝐹, 𝐽) of a set 𝐹 and 𝐽 ⊆ 2 # is called a matroid if • • ∅ ∈ 𝐽 , • 𝐵 ∈ 𝐽 for any 𝐵 ⊆ 𝐶 ∈ 𝐽 , • for any 𝐵, 𝐶 ∈ 𝐽 with 𝐵 < |𝐶| , there exists 𝑓 ∈ 𝐶 ∖ 𝐵 such that 𝐵 ∪ 𝑓 ∈ 𝐽 . argmax 𝑔(𝑇) Our objective: • $∈& 13
Examples of submodularity Summary Document • Feature selection • Influence maximization • Facility location • Maximum coverage • Data summarization • Image summarization • Document summarization …. This Photo by Unknown Author is licensed under CC BY- NC 14
A toy example 𝑠 𝑠 𝑠 # Set 𝐹 : m resources $ % ⋯ Objective: find 𝑇 ⊆ 𝐹 in the matroid that maximizes n agents ⋯ ' D 𝐺 & (𝑇) agent 1 agent 2 agent n &(# 𝐺 𝐺 𝐺 ⋯ ' # $ & : 2 " → 𝑆 Each agent has a private submodular function 𝐺 15
Our contributions non-private previous result (Mitrovic et al.,) our result 1 − 1 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) 1 − 1 𝑓 𝑃𝑄𝑈 − 𝑃( 𝜗 + Δ ⋅ 𝑠(𝑁) ⋅ ln(|𝐹|) utility 𝑓 𝑃𝑄𝑈 ) ) 𝜗 𝜗 ) 𝜗. 𝑠 𝑁 $ privacy -- 𝜗. 𝑠(𝑁) # 1 − * 𝑃𝑄𝑈 is the best possible approximation ratio unless P=NP. • Our algorithm uses almost cubic number of function evaluations 𝑃(𝑠 𝑁 ⋅ 𝐹 $ ⋅ ln( + , - )) . • Our privacy factor is worse than the previous work since we deal with multilinear extension. • Please see our paper for details and proofs • 16
Generalization of submodularity: K-submodular functions A function 𝑔: 𝑙 + 1 # → 𝑆 ! defined on 𝑙 -tuples of pairwise disjoint subsets of 𝐹 is called k-submodular if for all 𝑙 -tuples 𝑇 = (𝑇 ' , … , 𝑇 ( ) and 𝑈 = (𝑈 ' , … , 𝑈 ( ) of pairwise disjoint subsets of 𝐹 , 𝑔 𝑇 + 𝑔 𝑈 ≥ 𝑔 𝑇 ⊓ 𝑈 + 𝑔(𝑇 ⊔ 𝑈) where we define ' , … , 𝑇 ( ∩ 𝑈 ( ) 𝑇 ⊓ 𝑈 = ( 𝑇 ' ∩ 𝑈 𝑇 ⊔ 𝑈 = ( 𝑇 ' ∪ 𝑈 ' ∖ R 𝑇 ) ∪ 𝑈 ) , … , 𝑇 ( ∪ 𝑈 ( ∖ R 𝑇 ) ∪ 𝑈 ) ) )*' )*( A simpler definition: A monotone function is k-submodular if each orthant (fix the domain of each element to be {0, 𝑗} for some 𝑗 ∈ {1,2, … , 𝑙} ) is submodular. 17
Examples of k-submodularity Coupled feature selection • Sensor placement with k kinds of measures • Influence maximization with k topics • Variant of facility location • …. • Picture from: Near-optimal Sensor Placements : (a) Example placement Maximizing Information while Minimizing Communication Cost. A. Krause, A. Gupta, C. Guestrin, J. Kleinberg Picture from: On Bisubmodular Maximization A. P. Singh, A. Guillory, J. Bilmes This Photo by Unknown Author is licensed under CC BY- 18 NC
A toy example 𝐻 . : influence graph of 𝐻 $ : influence graph of 𝐻 # : influence graph of ad agency k. ad agency 2. ad agency 1. ad slots ad slots ad slots 𝑣 # 𝑣 # 𝑣 # 𝑣 $ 𝑣 $ 𝑣 $ 𝑤 # 𝑤 # 𝑤 # Edges incident to a user 𝑣 & in 𝑤 $ 𝑣 ) 𝑣 ) 𝑤 $ 𝑤 $ 𝑣 ) 𝐻 # , … , 𝐻 . are sensitive data 𝑤 ) 𝑤 ) 𝑤 ) about 𝑣 & . ⋮ ⋮ ⋮ ⋮ … ⋮ ⋮ 𝑤 % 𝑤 % 𝑤 % Objective: allocate at most B ≤ 𝑛 ad slots to ad agencies so that it maximizes number 𝑣 ' 𝑣 ' 𝑣 ' of influenced users. users users users 19
Our contributions non-private previous result our result 1 1 2 𝑃𝑄𝑈 − 𝑃(Δ ⋅ r M ⋅ ln(|𝐹|) utility 2 𝑃𝑄𝑈 ) 𝜗 privacy 𝜗. 𝑠(𝑁) Our algorithm is the first differentially private k-submodular maximization algorithm. • # $ 𝑃𝑄𝑈 is asymptotically tight assuming P ≠ NP. • Our algorithm uses almost linear number of function evaluations i.e., 𝑃(𝑙 ⋅ 𝐹 ⋅ ln(𝑠 𝑁 )) . • 20
Thanks! 21
Definition of submodular function Result 1 A function 𝑔: 2 " → 𝑆 is submodular if We present a differentially private algorithm for • for all 𝐵 ⊆ 𝐶 ⊆ 𝐹 , submodular maximization and: • and all elements 𝑓 ∈ 𝐹 ∖ 𝐶 we have Prove that our algorithm returns a solution with quality • at least 𝑔 A ∪ {𝑓} − 𝑔(𝐵) ≥ 𝑔 𝐶 ∪ 𝑓 − 𝑔(𝐶) # 1 − * 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠 Applications Prove that our algorithm preserve privacy • Viral marketing • Improve the number of function evaluations via a • • Information gathering sampling technique while still preserving privacy Feature selection for classification • Influence maximization in social network • • Document summarization… Result 2 (generalization of submodularity) We present the first differentially private algorithm for k- What is our objective? submodular maximization and: We need an optimization method such that Prove that our algorithm returns a solution with quality • It returns almost an optimal solution • at least It is efficient and fast • # $ 𝑃𝑄𝑈 + 𝑡𝑛𝑏𝑚𝑚 𝑏𝑒𝑒𝑗𝑢𝑗𝑤𝑓 𝑓𝑠𝑠𝑝𝑠 Preserves individuals’ privacy when we have sensitive • data: medical data ,web search data, social networks Prove our algorithm preserve privacy • Reduce number of function evaluations to almost linear • Differential privacy by a sampling technique while preserving privacy A rigorous notion of privacy that allows statistical analysis of sensitive data while providing strong privacy guarantees.
Recommend
More recommend