machine learning for signal
play

Machine Learning for Signal Processing Supervised Representations: - PowerPoint PPT Presentation

Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1 Definitions: Variance and Covariance > 0 > 0


  1. Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1

  2. Definitions: Variance and Covariance 𝜏 𝑧 𝜏 𝑦𝑧 > 0 β‡’ 𝑒𝑧 𝜏 𝑦 𝑒𝑦 > 0 Variance: S XX = E(XX T ), estimated as S XX = (1/N) XX T β€’ – How β€œspread” is the data in the direction of X 2 = 𝐹(𝑦 2 ) – Scalar version: 𝜏 𝑦 Covariance: S XY = E(XY T ) estimated as S XY = (1/N) XY T β€’ – How much does X predict Y – Scalar version: 𝜏 𝑦𝑧 = 𝐹(𝑦 2 ) MLSP 2

  3. Definition: Whitening Matrix = π‘Œ 𝑗 π‘Œ 𝑗 βˆ’0.5 (π‘Œ βˆ’ π‘Œ ) π‘Ž = Ξ£ π‘Œπ‘Œ If X is already centered βˆ’0.5 π‘Œ 𝑄(π‘Œ) π‘Ž = Ξ£ π‘Œπ‘Œ 𝑄(π‘Ž) βˆ’0.5 β€’ Whitening matrix: Ξ£ π‘Œπ‘Œ β€’ Transforms the variable to unit variance βˆ’1 β€’ Scalar version: 𝜏 𝑦 MLSP 3

  4. Definition: Correlation Coefficient 𝝉 π’šπ’› > 𝟏 𝝇 > 𝟏 𝜏 𝑧 βˆ’1 𝑦 1 𝑦 = 𝝉 𝑦 βˆ’1 𝑧 𝜏 𝑦 𝑧 = 𝝉 𝑧 1 βˆ’0.5 Ξ£ π‘Œπ‘ Ξ£ 𝑍𝑍 βˆ’0.5 β€’ Whitening matrix: Ξ£ π‘Œπ‘Œ 𝜏 𝑦𝑧 β€’ Scalar version: 𝜍 𝑦𝑧 = 𝜏 𝑧 𝜏 𝑧 – Explains how Y varies with X , after normalizing out innate variation of X and Y MLSP 4

  5. MLSP β€’ Application of Machine Learning techniques to the analysis of signals External Knowledge sensor Signal Feature Modeling/ Channel Capture Extraction Regression β€’ Feature Extraction: – Supervised (Guided) representation MLSP 5

  6. Data specific bases? β€’ Issue : The bases we have considered so far are data agnostic – Fourier / Wavelet type bases for all data may not be optimal β€’ Improvement I : The bases we saw next were data specific – PCA, NMF, ICA, ... – The bases changed depending on the data β€’ Improvement II : What if bases are both data specific and task specific? – Basis depends on both the data and a task MLSP 6

  7. Recall: Unsupervised Basis Learning β€’ What is a good basis? – Energy Compaction οƒ  Karkhonen-LoΓ¨ve – Uncorrelated οƒ  PCA – Sparsity οƒ  Sparse Representation, Compressed Sensing, … – Statistically Independent οƒ  ICA β€’ We create a narrative about how the data are created MLSP 7

  8. Supervised Basis Learning? β€’ What is a good basis? – Basis that gives best classification performance – Basis that maximizes shared information with another β€˜view’ β€’ We have some external information guiding our notion of optimal basis – Can we learn a basis for a set of variables that will best predict some value(s) MLSP 8

  9. Regression β€’ Simplest case – Given a bunch of scalar data points predict some value – Years are independent – Temperature is dependent MLSP 9

  10. Regression β€’ Formulation of problem β€’ Let’s solve! MLSP 10

  11. Regression β€’ Expand out the Frobenius norm β€’ Take derivative β€’ Solve for 0 MLSP 11

  12. Regression β€’ This is just basically least squares again β€’ Note that this looks a lot like the following – In the 1- d case where x predicts y this is just … MLSP 12

  13. Multiple Regression β€’ Robot Archer Example – Our robot fires defective arrows at a target β€’ We don’t know how wind might affect their movement, but we’d like to correct for it if possible. – Predict the distance from the center of a target of a fired arrow β€’ Measure wind speed in 3 directions 1 π‘₯ 𝑦 π‘Œ 𝑗 = π‘₯ 𝑧 π‘₯ 𝑨 MLSP 13

  14. Multiple Regression 1 β€’ Wind speed π‘₯ 𝑦 π‘Œ 𝑗 = π‘₯ 𝑧 π‘₯ 𝑨 𝑝 𝑦 β€’ Offset from center in 2 directions 𝑍 𝑗 = 𝑝 𝑧 β€’ Model 𝑍 𝑗 = π›Ύπ‘Œ 𝑗 MLSP 14

  15. Multiple Regression β€’ Answer – Here Y contains measurements of the distance of the arrow from the center – We are fitting a plane – Correlation is basically just the gradient MLSP 15

  16. Canonical Correlation Analysis β€’ Further Generaliztion (CCA) – Do all wind factors affect the position = π΅π‘Œ β€’ Or just some low-dimensional combinations π‘Œ – Do they affect both coordinates individually β€’ Or just some of combination 𝑧 = 𝐢𝑍 x x x x x x x x x x x MLSP 16

  17. Canonical Correlation Analysis β€’ Let’s call the arrow location vector Y and the wind vectors X – Let’s find the projection of the vectors for Y and X respectively that are most correlated w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 17

  18. Canonical Correlation Analysis β€’ What do these vectors represent? – Direction of max correlation ignores parts of wind and location data that do not affect each other β€’ Only information about the defective arrow remains! w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 18

  19. CCA Motivation and History β€’ Proposed by Hotelling (1936) β€’ Many real world problems involve 2 β€˜views’ of data β€’ Economics – Consumption of wheat is related to the price of potatoes, rice and barley … and wheat – Random vector of prices X – Random vector of consumption Y Y = Consumption X = Prices MLSP 19

  20. CCA Motivation and History β€’ Magnus Borga, David Hardoon popularized CCA as a technique in signal processing and machine learning β€’ Better for dimensionality reduction in many cases MLSP 20

  21. CCA Dimensionality Reduction β€’ We keep only the correlated subspace β€’ Is this always good? – If we have measured things we care about then we have removed useless information MLSP 21

  22. CCA Dimensionality Reduction β€’ In this case: – CCA found a basis component that preserved class distinctions while reducing dimensionality – Able to preserve class in both views MLSP 22

  23. Comparison to PCA β€’ PCA fails to preserve class distinctions as well MLSP 23

  24. Failure of PCA β€’ PCA is unsupervised – Captures the direction of greatest variance (Energy) – No notion of task or hence what is good or bad information – The direction of greatest variance can sometimes be noise – Ok for reconstruction of signal – Catastrophic for preserving class information in some cases MLSP 24

  25. Benefits of CCA β€’ Why did CCA work? – Soft supervision β€’ External Knowledge – The 2 views track each other in a direction that does not correspond to noise – Noise suppression (sometimes) β€’ Preview – If one of the sets of signals are true labels, CCA is equivalent to Linear Discriminant Analysis – Hard Supervision MLSP 25

  26. Multiview Assumption β€’ When does CCA work? – The correlated subspace must actually have interesting signal β€’ If two views have correlated noise then we will learn a bad representation β€’ Sometimes the correlated subspace can be noise – Correlated noise in both sets of views MLSP 26

  27. Multiview Assumption β€’ Why not just concatenate both views? – It does not exploit the extra structure of the signal (more on this in 2 slides) β€’ PCA on joint data will decorrelate all variables – Not good for prediction β€’ We want to decorrelate X and Y, but maximize cross-correlation between X and Y – High dimensionality οƒ  over-fit w z x x x x x x x x x x x w x w y MLSP 27

  28. Multiview Assumption β€’ We can sort of think of a model for how our data might be generated View 1 Source View 2 β€’ We want View 1 independent of View 2 conditioned on knowledge of the source – All correlation is due to source MLSP 28

  29. Multiview Examples β€’ Look at many stocks from different sectors of the economy – Conditioned on the fact that they are part of the same economy they might be independent of one another β€’ Multiple Speakers saying the same sentence β€’ The sentence generates signals from many speakers. Each speaker might be independent of each other conditioned on the sentence View 1 Source View 2 MLSP 29

  30. Multiview Examples http://mlg.postech.ac.kr/static/research/multiview_overview.png MLSP 30

  31. Matrix Representation 𝑗 2 𝐹 = π‘Œ 𝑗 βˆ’ 𝑍 𝑗 𝐘 = [π‘Œ 1 , π‘Œ 2 , … , π‘Œ 𝑂 ] 𝐙 = [𝑍 1 , 𝑍 2 , … , 𝑍 𝑂 ] 2 = π‘Œ 𝑗 π‘ˆ π‘Œ 𝑗 = 𝑒𝑠𝑏𝑑𝑓 𝐘𝐘 π‘ˆ 𝐘 𝐺 𝑗 2 = 𝑒𝑠𝑏𝑑𝑓(𝐘 βˆ’ 𝐙)(𝐘 βˆ’ 𝐙) π‘ˆ 𝐹 = 𝐘 βˆ’ 𝐙 𝐺 β€’ Expressing total error as a matrix operation MLSP 31

  32. Recall: Objective Functions β€’ Least Squares β€’ What is a good basis? – Energy Compaction οƒ  Karkhonen-LoΓ¨ve – Positive Sparse οƒ  NMF – Regression MLSP 32

  33. A Quick Review β€’ Cross Covariance MLSP 33

  34. A Quick Review β€’ The effect of a transform π‘Ž = π‘‰π‘Œ 𝐷 π‘Œπ‘Œ = 𝐹[π‘Œπ‘Œ π‘ˆ ] 𝐷 π‘Žπ‘Ž = 𝐹 π‘Žπ‘Ž π‘ˆ = 𝑉𝐷 π‘Œπ‘Œ 𝑉 π‘ˆ MLSP 34

  35. Recall: Objective Functions β€’ So far our objective needs to external data – No knowledge of task 𝑑. 𝑒. 𝑉 ∈ ℝ 𝑒×𝑙 2 argmin 𝐘 βˆ’ 𝑉𝐙 𝐺 π‘ π‘π‘œπ‘™ 𝑉 = 𝑙 π™βˆˆβ„ 𝑙×𝑂 β€’ CCA requires an extra view – We force both views to look like each other 2 π‘‰βˆˆβ„ 𝑒𝑦×𝑙 , π‘Šβˆˆβ„ 𝑒𝑧×𝑙 𝑉 π‘ˆ 𝐘 βˆ’ 𝑍 π‘ˆ 𝐙 𝐺 min 𝑑. 𝑒. 𝑉 π‘ˆ 𝐷 π‘Œπ‘Œ 𝑉 = 𝐽 𝑙 , π‘Š π‘ˆ 𝐷 𝑍𝑍 π‘Š = 𝐽 𝑙 MLSP 35

  36. Interpreting the CCA Objective β€’ Minimize the reconstruction error between the projections of both views of data β€’ Find the subspaces U,V onto which we project views X and Y such that their correlation is maximized β€’ Find combinations of both views that best predict each other MLSP 36

Recommend


More recommend