machine learning for compressive privacy
play

Machine Learning for Compressive Privacy S.Y. Y. K Kung Prince - PowerPoint PPT Presentation

Machine Learning for Compressive Privacy S.Y. Y. K Kung Prince ceton Un Univer ersi sity Machine Learning for Compressive Privacy Professor S.Y. Kung Email: kung@princeton.edu References: `Kernel Method and Machine


  1. Visualization for Supervised Learning Dataset • PCA : Unsupervised learning - Recoverability Scatter (Covariance) Matrix: ( ) M M ∑ ∑ = = λ T trace U S U p • DCA : Supervised Learning - Classification i i = = i 1 i 1 Noise Subspace in EVS Maximize Power : This image cannot currently be displayed.  Recoverability : Visualization Show origin inal l imag age. e.  Anti-Recoverability : Privacy Minimize RE (mean-square-error): Hide de “original imag age. e. This image cannot currently be displayed. DD: “Discriminant Distance” in CVS Signal Subspace in CVS Enable le C Classific ication. on.

  2. For supervised (i.e. labeled) dataset with L classes: = + S S S = Signal Matrix + Noise Matrix B W Signal Matrix = S B = Between-Class Matrix S beneficiary S B ∆ Noise Matrix = S W = Within-Class Matrix S S detrimental W

  3. Fisher’s Discr crimina nant nt An Analysis ( (LDA) • Linear Discriminant Analysis (LDA) maximizes the signal-to-noise ratio: T w S w s = = B SNR T n w S w W Enhan ance (pr proj ojec ected) ) S B B and d suppr uppress ( (pr proj ojec ected ed) S ) S W . • DCA optimizes Sum of SNRs= DD 2 :     T m m s w S w ∑ ∑     = ≡ = 2 i i B i SumSNR DD     T     n w S w = = i 1 i 1 i i W i Discriminant Distance(DD) ⇔ Discriminant Power (DP)     + m m ( ) p s n ∑ ∑     = ≡ ≡ = + 2 i i i DP P ' W DD m         n n = = i 1 i 1 i i

  4. DCA Optimization Criterion: Discriminant Power (DP) component analysis ⇒ Orthonormal Bases ⇒ Rotational Invariance (Visualization )  Invariance of Orthonormality  Invariance of DP (Redundancy)  Euclidean Orthonormality in the original EVS = Not Rotational Invariant W T W I  Mahalanobis Orthonormality = Canonical Orthonormality = T W S W I Rotational Invariant W Canonical Vector Space (CVS) Rotational Invariance of DD/DP

  5. DD in CVS: DD 2 = Sum of SNRs ~   ~ ~ T m m w S w ~ ∑ ∑   ~ ~ = = = T 2 i B i SumSNR w S w DD   ~ ~ ~ i B i T   w S w = = i 1 i 1 i W i where Class-ceiling property L-1 10:1 in compression ratio

  6. Al Algebraically, DCA and PCA are equivalent in CVS!     T m m p w S w ∑ ∑     ≡ = i i i DP     T     n w S w = = i 1 i 1 i i W i Preservation of DP under Coordinate Transformation ~ DCA ⇒     ~ ~ T T m m ( ) w S w w S w ∑ ∑     = = ≡ i i i i P ' W     ~ ~ T T     w S w w I w = = i 1 i 1 i W i i i ( ) ( ) ( ) m ~ ~ ~ ~ ~ ~ ∑ ~ ~ = = = T T w S w trace W S W P W i i ~ ~ = T w w 1 = ⇒ PCA i i i 1 DP in CVS = Power in CVS

  7. Pic ictoria iall lly, y ⇒ Can Data ta L Laund undry anonical al V Vector Sp Spac ace ( (CVS) S) The canonical vector space is defined as the whitened space. x = (S w ) - ½ x ~ L= 3, L-1=2 PCA in whitened whitening space canonical original space space re-Mapping ⌂ ’ w = [S w ] -½T w ~ EVS CVS

  8. DCA = PCA in CVS Forward Mapping: PCA Backward Mapping: Direct EVS Method: Trace Norm Optimizer Discriminant Matrix Find the first m principal eigenvectors under the ``canonically normality" constraint that i=1,2,…,m.

  9. Roles of DCA Eigenvalues DCA maximizes RDP and optimizes (minimizes/maximizes) RE for recoverability or anti-recoverability. Principal Eigenvalues for the Signal Subspace:

  10. Simula Si latio tion R Res esults lts With the data laundry process, DCA (i.e. supervised-PCA) far outperforms PCA. Visualization: DCA vs. PCA Dimension Reduction • (Cost, Power, Storage, Communication) • Prediction Performance (Classification Accuracy in Prediction Phase)

  11. HAR Dataset: L=6 & M=561 200 runs m=5 >>10% Gain DCA PCA PCA lab2 HAR classifier RE Accuracy p

  12. Visualization: PCA vs. DCA PCA 12 PCA 34 Before Data Laundry Long Inter-Group Distance DCA 23 DCA 25 After Data Laundry

  13. Machine Learning for Compressive Privacy iii. Compressive Privacy in Brandeis Program In the internet era, we benefit greatly from the combination of packet switching, bandwidth, processing and storage capacities in the cloud. However, “big-data” often has a connotation of “big- brother”, since the data being collected on consumers like us is growing exponentially, attacks on our privacy are becoming a real threat. New technologies are needed to better assure our privacy protection when we upload personal data to the cloud. An important development is Discriminant Component Analysis (DCA), which offers a compression scheme to enhance privacy protection in contextual and collaborative learning environment. DCA can be viewed as a supervised PCA which can simultaneously rank order the (1) the sensitive components and (2) the desensitized components.

  14. Cloud Computing: Data Center Pumped by wireless, internet, and parallel processing technologies, cloud computing offers remotely hosted application logic units, data stores, and a diversity of application resources. • It offers data processing services ubiquitously, i.e. at any time, anywhere, and for anyone. • It manages the server farm, supports extensive database and vast storage space, and is ready to lease out, on demand from clients, a variable number of machines. It has the premise of elastic hosting, offering application domains for lease to clients. The main problem of cloud computing lie on the communication cost and privacy protection. Big data has nowadays a connotation of Big brother.

  15. Google’s Data Centers Colorful water pipes cool one of Google’s data centers. The company sells user data to advertisers for targeted marketing campaigns. Connie Zhou/Google

  16. Machine learning Approach to Compressive Privacy With rapidly growing internet commerce, much of our daily activities are moving online, abundance of personal information (such as sale transactions) are being collected, stored, and circulated around the internet and cloud servers, often without the owner's knowledge This raises an imminent concern on the protection and safety of sensitive and private data, i.e. ``Online Privacy", also known as ``Internet Privacy" or ``Cloud Privacy". This course presents some machine learning methods useful for and internet privacy preserving data mining, which has recently received a great deal of attention in the IT community.

  17. Why DARPA Brandeis Program? When the control of data protection is left entirely to the cloud server, the data privacy will unfortunately become vulnerable to hacker attack or unauthorized leakage. It is therefore much safer to keep the control for data protection solely at the hand of the data owner and not taking chance with the cloud.

  18. Public vs. Private Spaces The analysis of PP technology requires a clear separation of two spaces: public vs. private spaces. public sphere (space) private sphere (space)

  19. Encryption for Privacy Protection • cloud server Public Space (Cloud) • trusted authority • Encrypted Data • Decrypted Data Private Space (Client) • data owners

  20. Why not Encryption? To ensure protection via encryption, during the processing in the ``public sphere", the input data of each owner and any intermediate results will only be revealed/decrypted to the trusted authority. Nevertheless, there exists substantial chance of encountering hacker attack or unauthorized leakage, when leaving the data protection entirely to the hand/mercy of the cloud server. Theme of the Brandeis Program: Control for data protection should be returned to the data owner rather than at leaving it at the mercy of the cloud server .

  21. Data owner should have control over data From the data privacy's perspective, the accessibility of data is divided into two separate spheres: (1) private sphere: where data owners generate and process decrypted data; and (2) public sphere: where cloud servers can generally access only encrypted data, with the exception that only the trusted authority may access decrypted data confidentially. When the control of data protection is left entirely to the cloud server, however, the data become vulnerable to hacker attack or unauthorized leakage. It is safer to let data owner control the data privacy and not to take chance with the cloud servers. To this end, we must design privacy preserving information systems so that the shared/pushed data are only useful for the intended utility and not easily diverted for malicious privacy intrusion.

  22. DARPA Brandeis Program on Internet Privacy(IP) : US$60M/4.5 Yrs. Build information systems so that the shared data could be • effective and relevant for the intended utility (e.g. classification) • but not easily diverted to other purposes (e.g. privacy).

  23. Brandeis Mobile CRT TA3: Experimental Raytheon BBN, Prototype Invincea Systems TA4: Measuring Privacy CMU, TA3: Human Data Tel Aviv U Interaction Cybernetica, U of Tartu UC Berkeley, Stealth Software MIT, Cornel, TA1: Privacy Technologies U MD Technologies Iowa State U, Princeton

  24. In emergency such as bomb threat, many mobile images from various sources may be voluntarily pushed to the command center for wide-scale forensic analysis. In this case, CP may be used to compute the dimension-reduced feature subspace which can (1) effectively identify the suspect(s) and (2) adequately obfuscate the face images of the innocent.

  25. Mobile Information Flow Control Center makes request for images near an incident. User’s phone responds with some incident-relevant images per their privacy policy User Application Privacy Policy PE Android Example: location = utility; face = privacy 56

  26. More on Mobile PP Applications ”The Android platform provides several sensors that let you monitor the motion of a device. Two of these sensors are always hardware-based (the accelerometer and gyroscope)”, (Most devices have both gyroscope and accelerometer.) and ”three of these sensors can be either hardware-based or software-based (the gravity, linear acceleration, and rotation vector sensors).” Adapted from CC Liu, et al. DP ⇒ = private

  27. More on Mobile PP Applications D (data) = Activity (B/L) Location (W/L) Tabs (B/L) Adapted from CC Liu, et al app (M)= speech , motion, ID, password

  28. Mach chine e learn rning ing Appro roac ach h to Pri rivacy Protectio tion n of of Interne rnet/ t/Cl Cloud d Data Objective: Explore information systems simultaneously perform  Utility Space Maximization: deliver intended data mining, classification, and learning tasks.  Privacy Space Minimization: safeguard personal/private information. We need to develop new methods to jointly optimize two design considerations P & U: Privacy Protection & Utility Maximization .

  29. CP involves joint optimization over three design spaces: (i) Feature Space (ii) Utility Subspace; and (iii) Cost Subspace (i.e. Privacy Subspace).

  30. Collaborative Learning for PP USC’s Pickle: A collaborative learning model built for MFCC speaker recognition - widely used in acoustic mobile applications - to enable speaker recognition without revealing the speech content itself. public data Compressio ion n (P (PCA) Rando dom Noi oise private data Random om T Transfor orm

  31. Collaborative Learning for PP • Single-user vs. Multi-user environments • Centralized vs. Distributed Processing • Unsupervised vs. Unsupervised Learning

  32. Example of CUEP CU: classification for utility • It enables classification of face/speech data ( utility ): B vs. B • EP: estimation for privacy. while protecting the privacy (e.g. face image or speech content ) from malicious cloud users.

  33. • CUEP Example classification formulation for utility but estimation for privacy. Original Classification Personality Privacy Masked Data

  34. • Low Alert Level: • High Alert Level: For example, in US, EFF 100 systems stream data from surveillance cameras (in the future, mobile cameras) can very well reveal where/when your car is. “ i ll i f l i b i d j b d i l l d i lik SIFT

  35. Discriminant Component Analysis (DCA) Discriminant Component Analysis (DCA) offers a compression scheme to enhance privacy protection in contextual and collaborative learning environment. The DCA has (a) its classification goal characterized by the discriminant distance and (b) its privacy components controlled by a ridge parameter. Therefore, DCA is a promising algorithmic tool for CP.

  36. ρ ’ = 0.00001 ≈ 0 This is not good enough as we want to Rank Order S U and N U Rank Order S P and N P

  37. ρ ’ = 0.00001 Noise-Subspace Eigenfaces

  38. Doubly-Ridged DCA Regulated Discriminant Power (RDP) Trace Norm Optimizer ρ : Ridge Parameter for Noise Matrix ρ ' : Ridge Parameter for Signal Matrix

  39. Algorithm: DCA Learning Model • Compute the Discriminant Matrix: maximizing RE for anti-reconstruction maximizing component power for utility maximizatio Perform Eigenvalue Decomposition: • under the canonical normality that • The optimal DCA projection matrix is:

  40. Dual Roles of P-DCA Eigenvalues Simultaneously rank-order two subspaces: Principal Eigenvalues for the Private Subspace: Minor Eigenvalues for the Privatized Subspace:

  41. Example: Yale Face Dataset  Privacy space is characterized by 15- category Classification Formulation; while  Utility aims at • utility-oriented classification (e.g. eyeglasses or not).

  42. ρ ’ = -0.05 ρ ’ = 0

  43. ρ ’ = -0.05 Private Eigenfaces PU (Privatized/Utilizable) Eigenfaces

  44. w/o rank ordering < ρ ’ = +0.00001 < < w. rank ordering < ρ ’ = -0.05

  45. Machine Learning for Compressive Privacy iv. Differential Utility/Cost Advantage (DUCA) We shall explore joint optimization over three design spaces: (a) Feature Space, (b) Classification Space, and (c) Privacy Space. This prompts a new paradigm called DUCA to explore information systems which simultaneously perform Utility Space Maximization: deliver intended data mining, classification, and learning tasks.

  46. DUCA DCA PCA DUCA ≠

  47. Our (CP) approach involves joint optimization over three design spaces: (i) Feature Space (ii) Utility Subspace; and (iii) Cost Subspace (i.e. Privacy Subspace).

  48. Entropy and Venn Diagram

  49. Differential Privacy

  50. Compressive Privacy (CP) CP enables the data user to “encrypt” message using privacy-information-lossy transformation, e.g. • Dimension reduction (Subspace) Feature selection • hence • preserving data owner's privacy while • retaining the capability in facilitating the intended classification purpose.

  51.  CS Theoretical Foundations • Statistical Analysis • Subspace Optimization • Information Theory • Estimation Theory • Machine Learning

  52. Entropy and Covariance Matrix

  53. Double Income Problem (DIP) Utility Privacy

  54. Compressive Privacy

  55. Estimation Theory and Compressive Privacy Gauss-Markov (Statistical Estimation) Theorem argmin ( || f T - y || 2 + || || 2 )

  56. Machine Learning and Compressive Privacy

  57. For CP problems, there are two types of the teacher values in the dataset: one for the utility labels and one for privacy labels. It implies that there are two types of between-class scatter matrices, denoted by for the utility labels and for the privacy labels.

  58. DUCA for Supervised Machine Learning

  59. Generalized Eigenvalue Problem Again, the optimal queries can be directly derived from the principal eigenvectors of Note that there are only L+C-2 meaningful eigenvectors, because where L and C denote the numbers of utility and privacy labels.

  60. DUCA is only a generalization of its predecessor, called DCA, designed for the utility-only machine learning applications. 𝑡 𝑡 𝑡 + 𝑜 𝑜 argmax 𝑡 + 𝑜 argmax argmax 𝑜 Naturally, just like DCA, there are ways for DUCA to extract additional meaningful queries.

  61. with the utility/ privacy class labels where ``H/M/L" denotes the three (High/Middle/Low) utility classes (i.e. family income) and ``+/-" denotes the two privacy classes (i.e. who-earns-more between the couple).

  62. UAM and CAM can be learned from the given dataset and their respective class labels. Parallel Extraction of Multiple Queries: Thereafter, the two principle eigenvectors of the generalized eigenvectors of can be computed as -0.14 -0.002 -0.87 -0.11 𝑔 1 = 𝑔 2 = 0. 68 -0.17 -0.44 0.72

  63. The two-dimensional visualization of the DUCA-CP subspace, in which the family income classes (H/M/L) are highly separable but not so for the income disparity categories (+/-). For example, the query here obviously belongs to the middle-class (utility) but its disparity (privacy) remains unknown, i.e. protected.

Recommend


More recommend