The Theory of Bringing Privacy into Practice, 2015 1/8 The Theory of Web Transparency: Algorithms and Trade-offs Augustin Chaintreau 1 , Guillaume Ducoffe 2 , 3 , Roxana Geambasu 1 , ecuyer 1 Mathias L´ 1Columbia University 2Univ. Nice Sophia Antipolis, CNRS, I3S, UMR 7271, 06900 Sophia Antipolis, France 3Inria, France
The Theory of Bringing Privacy into Practice, 2015 2/8 What it is all about • The opaque use of Big Data. Myriads of personal data collected (and related to each other) → tweets, emails, website visits, click history, . . . − Potential abuses: [Hannack et al., 2014; Acquisti and Fong, 2012] − → online discriminations in advertising, hiring, pricing, . . .
The Theory of Bringing Privacy into Practice, 2015 3/8 Objectives • in this talk: targeting = online discrimination Inputs: - a set of “sensitive” data; - an advertisement (or prices, or a recommended product, . . . ).
The Theory of Bringing Privacy into Practice, 2015 3/8 Objectives • in this talk: targeting = online discrimination Inputs: - a set of “sensitive” data; - an advertisement (or prices, or a recommended product, . . . ). Targeting Detection Problem: Problem: is the ad received because of some of the data ?
The Theory of Bringing Privacy into Practice, 2015 3/8 Objectives • in this talk: targeting = online discrimination Inputs: - a set of “sensitive” data; - an advertisement (or prices, or a recommended product, . . . ). Targeting Detection Problem: Problem: is the ad received because of some of the data ? Targeting Identification Problem: Output: the data that are targeted by the ad.
The Theory of Bringing Privacy into Practice, 2015 4/8 Our tool: Xray one or more data inputs Web services (emails, searches, (monitor, correlate) viewed products) xRay targeted outputs associations (ads, recommended (email→ad, products and videos) viewed→recommend)
The Theory of Bringing Privacy into Practice, 2015 4/8 Our tool: Xray open-source: https://github.com/matlecu/xray/ one or more data inputs Web services (emails, searches, (monitor, correlate) viewed products) xRay targeted outputs associations (ads, recommended (email→ad, products and videos) viewed→recommend)
The Theory of Bringing Privacy into Practice, 2015 4/8 Our tool: Xray open-source: https://github.com/matlecu/xray/ one or more data inputs Web services (emails, searches, (monitor, correlate) viewed products) xRay targeted outputs associations (ads, recommended (email→ad, products and videos) viewed→recommend) − → How to find the associations ?
The Theory of Bringing Privacy into Practice, 2015 5/8 Algorithmic settings • Seeding accounts at random . • Collection of the ads seen by the “shadow accounts”. (intuition) targeting occurs ⇐ ⇒ observations are not random.
The Theory of Bringing Privacy into Practice, 2015 5/8 Algorithmic settings • Seeding accounts at random . − → random subsets of sensitive data • Collection of the ads seen by the “shadow accounts”. → noisy oracle with (unknown) probabilities p in , p out of answering “yes”. − Learning Problem: targeting ∼ monotone DNF formula over the data Input: a noisy oracle O to make membership queries. Output: the (unknown) targeting function f .
The Theory of Bringing Privacy into Practice, 2015 6/8 Main results We seek for algorithms: - with a low query complexity (create as few accounts as possible) - performing “efficient” computations ( polynomial-time ) - whose output is correct w.h.p. ( � = PAC)
The Theory of Bringing Privacy into Practice, 2015 6/8 Main results We seek for algorithms: - with a low query complexity (create as few accounts as possible) - performing “efficient” computations ( polynomial-time ) - whose output is correct w.h.p. ( � = PAC) Theorem ( L´ ecuyer et al., 2014 ) Let N be the number of sensitive data. Under a monotonicity assumption on the targeting function f , and a bounded-noise assumption (i.e., the ratio p out / p int is bounded), Then a.a.s., we can learn a targeting function of constant-size with O (log N ) query complexity in O ( N · log N ) -time.
The Theory of Bringing Privacy into Practice, 2015 7/8 Future work Ongoing extension of the model to take into account negative targeting ; Price of Opacity − → How hard is it for an advertiser to conceal her targeting ? − → (Preliminary results) increasing by q the noise (ratio p out / p int ) makes the revenue of the advertiser decrease like 1 / q .
The Theory of Bringing Privacy into Practice, 2015 8/8
Recommend
More recommend