collecting user s data in a socially responsible manner
play

Collecting User's Data in a Socially-Responsible Manner. Photograph: - PowerPoint PPT Presentation

Collecting User's Data in a Socially-Responsible Manner. Photograph: Daniel Beltra/Greenpeace Josep M. Pujol Konark Modi @konarkmodi @solso About Cliqz 80+ - Team size 500,000 - DAU 3 Million+ - Downloads (Germany only) 1


  1. “Collecting User's Data in a Socially-Responsible Manner.” Photograph: Daniel Beltra/Greenpeace Josep M. Pujol Konark Modi @konarkmodi @solso

  2. About Cliqz • 80+ - Team size • 500,000 - DAU • 3 Million+ - Downloads (Germany only) • 1 billion+ - Indexed pages (We do not believe in indexing the web.) • 5 TB - In-Memory indexed (Based on open source and in-house build NoSQL stores.) • 10x more coverage for anti-phishing protection - As compared to other players like safebrowsing by Google. • Upcoming products like Anti-tracking etc.

  3. About Cliqz

  4. We Love Data …

  5. Let's step back a bit in time, to get the context.

  6. “ Data is the new oil ” - Clive HumBy (2006) Source : http://thehumanfaceofbigdata.com

  7. Is privacy the new Green ? Data is still being collected without enough controls & measures.

  8. Is privacy the new Green ? The biggest by-product of which being SESSIONS.

  9. How ? Alice Alice Uncharted water Alice Bob MAP/REDUCE :D Alice Server-Side Bob Client-Side

  10. Instead … Alice Alice Uncharted Alice water MAP/REDUCE :D MAP/REDUCE :D Alice Bob Server-Side MAP/REDUCE :D Bob Client-Side

  11. Who is responsible ? Is there a conspiracy theory or an evil plan ?

  12. Well, we have a simpler explanation: It’s the consequences of common development practices, which results in trading user’s data knowingly / unknowingly !

  13. Demo

  14. This looks like a toy example ?

  15. Let’s take a more complex case Which are the queries that are so bad that forces people to redo the same query elsewhere ?

  16. apache search Alice big data engine 1 conf apache search Alice big data engine 2 conf Client-Side

  17. apache apache search search Alice big data Alice big data engine 1 engine 1 conf conf apache apache search search Alice big data Alice big data engine 2 engine 2 conf conf Uncharted Map-Reduce water Client-Side Server - Side

  18. apache apache search search Alice big data Alice big data engine 1 engine 1 conf conf apache apache search search Alice big data Alice big data engine 2 engine 2 conf conf Uncharted Map-Reduce water Client-Side Server - Side

  19. apache apache search search Alice Alice big data big data engine 1 engine 1 conf conf apache apache search search Alice big data Alice big data engine 2 engine 2 conf conf Uncharted Map-Reduce water Client-Side Server - Side apache search Alice big data engine 1 conf apache search Alice big data engine 2 conf Map-Reduce

  20. apache apache search search Alice Alice big data big data engine 1 engine 1 conf conf apache apache search search Alice big data Alice big data engine 2 engine 2 conf conf Uncharted Map-Reduce water Client-Side Server - Side apache search Alice big data engine 1 conf apache search Alice big data engine 2 conf Map-Reduce

  21. We mentioned before, we believe in data and are not against the collection . • Stopping data collection altogether would be foolish and dangerous.This also means stopping the wheels of innovation. • Who would benefit the most by supporting the ban on advertisements of tobacco products??

  22. “Socially responsible manner” is an analogy to ensure events being collected are not su ff ering from pollutants like Explicit IDs, Implicit IDs and reaches home Secure.

  23. Why does CLIQZ Care ?

  24. German Data Privacy Laws Security breaches When government knocks on your door

  25. So what do we bring on the table ??

  26. HUMAN WEB • We have developed HumanWeb to balance the Right-to-Privacy with the needs to build products that improve the web and allow for more openness. • Ensuring data that can infer sessions, linkages to navigation patterns is not collected. • Does not create so much data that could allow identification of individuals • We do not want to know who "YOU" are, what "YOU" searched and when "YOU" searched. • Designed keeping in mind so that a "malicious/untrustworthy" actor or as a matter of fact even anyone at Cliqz, getting access to the raw data flow cannot infer or identify individuals.

  27. Sample events: { "action" : action of the message, "ver" : version name, "type" : "humanweb", "payload" : { }, //the actual data "ts" : UTC time capped to the day, e.g. 20150909 } • Sample event for Page • Sample event for Query

  28. HumanWeb Map-Reduce Aggregations, Heuristics, Local storage | Structural data about webpages Filtering,Hashing [ Final checks {event1}, {event2}, {event3} Secure Channel Filtering ] Event Queue | Schedule to ensure not Sanitisation / Masking sent in batch Client-side

  29. Privacy breaches on the way home To achieve total privacy , we must rely on a network of proxies that remove any network-related data like cookies, IP , headers so that finger-printing is impossible .

  30. SecureChannel : Protection from network fingerprinting

  31. SecureChannel : What do we encrypt ? The queries from the user (initiated by them upon activity on the • Cliqz’s instrumented Firefox address bar) . • All telemetry signals (initiated by Cliqz’s instrumented Firefox) • All messages regarding the HumanWeb data collection e ff ort. Also, before reaching our infrastructure the encrypted messages are routed through a mesh of proxies.

  32. SecureChannel : How do we encrypt ? Client side : 128-bit symmetric AES encryption, OpenSSL RSA 1024-bit encryption. EventLogger: 128-bit symmetric AES encryption, OpenSSL RSA 4096-bit encryption. Life-Cycle of hashes / keys : • AES : Hash-keys used with AES are used only one time. Even if the user types the same query . • Public / Private KeyPair ( Client ) : • The Keys on client side are all short lived, we continuously generate keys on the client-side. • The public/private key pair of the client (the Extension) is meant to be used only once and then thrown away. The key pairs are regenerated to fill a pool while the browser is idle. • Public / Private KeyPair ( Server ) : • Only public part of this key is shared with the extension. • The client uses it while encrypting the request. This is long lived key, currently only to change in the case it is compromised

  33. SecureChannel : How do we encrypt ? (Extension) encryptedRequest(iv:encryptedMsg:encryptedKey) iv : Initializaton Vector msg = (originalRequest + ExtensionPublicKey) key = md5(msg) encryptedMsg = AES.encrypt(msg, key, {mode: CBC, padding: PKCS7, iv: iv}) encryptedKey = sign(EventLoggerPublicKey, key) Each request to be encrypted has the following components : • Message / Request to encrypt : Query or Data • ExtensionPublicKey : Chosen from a pool of public keys for that user on the machine, key is used only once and then discarded). • Initialisation Vector : Derived from wordarray of 16-bits. • EventLoggerPublicKey : Our public key, shared with the extension.

  34. SecureChannel : Routing ? (Extension) • Extension maintains a list of proxies which are healthy / good at that point in time. • When sending the request / message extension picks up the end-point in a round-robin fashion (Round-robin for now). • To avoid the risk of proxies being malicious with the message, we implement scrambling and splitting of messages into a random ‘n’ parts just before sending the message from extension. • The value of n is determined by the extension, we expect ‘n’ to be 1,2,4 or 8 for the time being. Also, the value of ’n’ is not known to proxies hence they are unaware if it has all the parts. • The only way to tamper a message is to have all the parts to decrypt it, but since messages are scrambled, split and send through di ff erent proxies this makes the messages safe from proxies. • Event Logger waits for all the message by combination at our Event Logger(Secure) can decrypt the message.

  35. SecureChannel : How do we decrypt ? (Server) EncryptedRequest = iv:encryptedMsg:encryptedKey key = unlock(EventLoggerPrivateKey, encryptedKey) msg = AES.decrypt(encryptedMsg, key, {mode: CBC, padding: PKCS7, iv: iv) request = msg.data ExtensionPublicKey = msg.pk (We need it to sign the response) Important: • Because the server receives messages in parts, to get the key and message we rely on combinations. • The message itself is scrambled, so even if it is decrypted we need to stitch it together by trying different combinations.

  36. All talk and no play, makes Jack a dull boy ! Demo

  37. Thank You http://www.cliqz.com/en photo: projectsecretidentity.org We believe it’s possible, we are actually doing it

Recommend


More recommend