Machine Learning for Signal Processing Supervised Representations: - PowerPoint PPT Presentation

Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1

Definitions: Variance and Covariance 𝜏 𝑧 𝜏 𝑦𝑧 > 0 ⇒ 𝑒𝑧 𝜏 𝑦 𝑒𝑦 > 0 Variance: S XX = E(XX T ), estimated as S XX = (1/N) XX T • – How “spread” is the data in the direction of X 2 = 𝐹(𝑦 2 ) – Scalar version: 𝜏 𝑦 Covariance: S XY = E(XY T ) estimated as S XY = (1/N) XY T • – How much does X predict Y – Scalar version: 𝜏 𝑦𝑧 = 𝐹(𝑦 2 ) MLSP 2

Definition: Whitening Matrix = 𝑌 𝑗 𝑌 𝑗 −0.5 (𝑌 − 𝑌 ) 𝑎 = Σ 𝑌𝑌 If X is already centered −0.5 𝑌 𝑄(𝑌) 𝑎 = Σ 𝑌𝑌 𝑄(𝑎) −0.5 • Whitening matrix: Σ 𝑌𝑌 • Transforms the variable to unit variance −1 • Scalar version: 𝜏 𝑦 MLSP 3

Definition: Correlation Coefficient 𝝉 𝒚𝒛 > 𝟏 𝝇 > 𝟏 𝜏 𝑧 −1 𝑦 1 𝑦 = 𝝉 𝑦 −1 𝑧 𝜏 𝑦 𝑧 = 𝝉 𝑧 1 −0.5 Σ 𝑌𝑍 Σ 𝑍𝑍 −0.5 • Whitening matrix: Σ 𝑌𝑌 𝜏 𝑦𝑧 • Scalar version: 𝜍 𝑦𝑧 = 𝜏 𝑧 𝜏 𝑧 – Explains how Y varies with X , after normalizing out innate variation of X and Y MLSP 4

MLSP • Application of Machine Learning techniques to the analysis of signals External Knowledge sensor Signal Feature Modeling/ Channel Capture Extraction Regression • Feature Extraction: – Supervised (Guided) representation MLSP 5

Data specific bases? • Issue : The bases we have considered so far are data agnostic – Fourier / Wavelet type bases for all data may not be optimal • Improvement I : The bases we saw next were data specific – PCA, NMF, ICA, ... – The bases changed depending on the data • Improvement II : What if bases are both data specific and task specific? – Basis depends on both the data and a task MLSP 6

Recall: Unsupervised Basis Learning • What is a good basis? – Energy Compaction  Karkhonen-Loève – Uncorrelated  PCA – Sparsity  Sparse Representation, Compressed Sensing, … – Statistically Independent  ICA • We create a narrative about how the data are created MLSP 7

Supervised Basis Learning? • What is a good basis? – Basis that gives best classification performance – Basis that maximizes shared information with another ‘view’ • We have some external information guiding our notion of optimal basis – Can we learn a basis for a set of variables that will best predict some value(s) MLSP 8

Regression • Simplest case – Given a bunch of scalar data points predict some value – Years are independent – Temperature is dependent MLSP 9

Regression • Formulation of problem • Let’s solve! MLSP 10

Regression • Expand out the Frobenius norm • Take derivative • Solve for 0 MLSP 11

Regression • This is just basically least squares again • Note that this looks a lot like the following – In the 1- d case where x predicts y this is just … MLSP 12

Multiple Regression • Robot Archer Example – Our robot fires defective arrows at a target • We don’t know how wind might affect their movement, but we’d like to correct for it if possible. – Predict the distance from the center of a target of a fired arrow • Measure wind speed in 3 directions 1 𝑥 𝑦 𝑌 𝑗 = 𝑥 𝑧 𝑥 𝑨 MLSP 13

Multiple Regression 1 • Wind speed 𝑥 𝑦 𝑌 𝑗 = 𝑥 𝑧 𝑥 𝑨 𝑝 𝑦 • Offset from center in 2 directions 𝑍 𝑗 = 𝑝 𝑧 • Model 𝑍 𝑗 = 𝛾𝑌 𝑗 MLSP 14

Multiple Regression • Answer – Here Y contains measurements of the distance of the arrow from the center – We are fitting a plane – Correlation is basically just the gradient MLSP 15

Canonical Correlation Analysis • Further Generaliztion (CCA) – Do all wind factors affect the position = 𝐵𝑌 • Or just some low-dimensional combinations 𝑌 – Do they affect both coordinates individually • Or just some of combination 𝑧 = 𝐶𝑍 x x x x x x x x x x x MLSP 16

Canonical Correlation Analysis • Let’s call the arrow location vector Y and the wind vectors X – Let’s find the projection of the vectors for Y and X respectively that are most correlated w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 17

Canonical Correlation Analysis • What do these vectors represent? – Direction of max correlation ignores parts of wind and location data that do not affect each other • Only information about the defective arrow remains! w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 18

CCA Motivation and History • Proposed by Hotelling (1936) • Many real world problems involve 2 ‘views’ of data • Economics – Consumption of wheat is related to the price of potatoes, rice and barley … and wheat – Random vector of prices X – Random vector of consumption Y Y = Consumption X = Prices MLSP 19

CCA Motivation and History • Magnus Borga, David Hardoon popularized CCA as a technique in signal processing and machine learning • Better for dimensionality reduction in many cases MLSP 20

CCA Dimensionality Reduction • We keep only the correlated subspace • Is this always good? – If we have measured things we care about then we have removed useless information MLSP 21

CCA Dimensionality Reduction • In this case: – CCA found a basis component that preserved class distinctions while reducing dimensionality – Able to preserve class in both views MLSP 22

Comparison to PCA • PCA fails to preserve class distinctions as well MLSP 23

Failure of PCA • PCA is unsupervised – Captures the direction of greatest variance (Energy) – No notion of task or hence what is good or bad information – The direction of greatest variance can sometimes be noise – Ok for reconstruction of signal – Catastrophic for preserving class information in some cases MLSP 24

Benefits of CCA • Why did CCA work? – Soft supervision • External Knowledge – The 2 views track each other in a direction that does not correspond to noise – Noise suppression (sometimes) • Preview – If one of the sets of signals are true labels, CCA is equivalent to Linear Discriminant Analysis – Hard Supervision MLSP 25

Multiview Assumption • When does CCA work? – The correlated subspace must actually have interesting signal • If two views have correlated noise then we will learn a bad representation • Sometimes the correlated subspace can be noise – Correlated noise in both sets of views MLSP 26

Multiview Assumption • Why not just concatenate both views? – It does not exploit the extra structure of the signal (more on this in 2 slides) • PCA on joint data will decorrelate all variables – Not good for prediction • We want to decorrelate X and Y, but maximize cross-correlation between X and Y – High dimensionality  over-fit w z x x x x x x x x x x x w x w y MLSP 27

Multiview Assumption • We can sort of think of a model for how our data might be generated View 1 Source View 2 • We want View 1 independent of View 2 conditioned on knowledge of the source – All correlation is due to source MLSP 28

Multiview Examples • Look at many stocks from different sectors of the economy – Conditioned on the fact that they are part of the same economy they might be independent of one another • Multiple Speakers saying the same sentence • The sentence generates signals from many speakers. Each speaker might be independent of each other conditioned on the sentence View 1 Source View 2 MLSP 29

Multiview Examples http://mlg.postech.ac.kr/static/research/multiview_overview.png MLSP 30

Matrix Representation 𝑗 2 𝐹 = 𝑌 𝑗 − 𝑍 𝑗 𝐘 = [𝑌 1 , 𝑌 2 , … , 𝑌 𝑂 ] 𝐙 = [𝑍 1 , 𝑍 2 , … , 𝑍 𝑂 ] 2 = 𝑌 𝑗 𝑈 𝑌 𝑗 = 𝑢𝑠𝑏𝑑𝑓 𝐘𝐘 𝑈 𝐘 𝐺 𝑗 2 = 𝑢𝑠𝑏𝑑𝑓(𝐘 − 𝐙)(𝐘 − 𝐙) 𝑈 𝐹 = 𝐘 − 𝐙 𝐺 • Expressing total error as a matrix operation MLSP 31

Recall: Objective Functions • Least Squares • What is a good basis? – Energy Compaction  Karkhonen-Loève – Positive Sparse  NMF – Regression MLSP 32

A Quick Review • Cross Covariance MLSP 33

A Quick Review • The effect of a transform 𝑎 = 𝑉𝑌 𝐷 𝑌𝑌 = 𝐹[𝑌𝑌 𝑈 ] 𝐷 𝑎𝑎 = 𝐹 𝑎𝑎 𝑈 = 𝑉𝐷 𝑌𝑌 𝑉 𝑈 MLSP 34

Recall: Objective Functions • So far our objective needs to external data – No knowledge of task 𝑡. 𝑢. 𝑉 ∈ ℝ 𝑒×𝑙 2 argmin 𝐘 − 𝑉𝐙 𝐺 𝑠𝑏𝑜𝑙 𝑉 = 𝑙 𝐙∈ℝ 𝑙×𝑂 • CCA requires an extra view – We force both views to look like each other 2 𝑉∈ℝ 𝑒𝑦×𝑙 , 𝑊∈ℝ 𝑒𝑧×𝑙 𝑉 𝑈 𝐘 − 𝑍 𝑈 𝐙 𝐺 min 𝑡. 𝑢. 𝑉 𝑈 𝐷 𝑌𝑌 𝑉 = 𝐽 𝑙 , 𝑊 𝑈 𝐷 𝑍𝑍 𝑊 = 𝐽 𝑙 MLSP 35

Interpreting the CCA Objective • Minimize the reconstruction error between the projections of both views of data • Find the subspaces U,V onto which we project views X and Y such that their correlation is maximized • Find combinations of both views that best predict each other MLSP 36

Machine Learning for Signal Processing Supervised Representations: - PowerPoint PPT Presentation

Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1 Definitions: Variance and Covariance > 0 > 0

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Waveform Generation Fundamental part of signal processing is the signal. Within the

Sampling a Signal an analog signal together with some samples of the signal. The samples

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 29 August

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

No Financial Anything New? Disclosures Eric Wirtz, MD February 15, 2020 VA Larynx Trial

Chapter 5 Sound Propagation in the Human Vocal Tract 1 Basics can

The Relationship between Faith, Tradition & Culture Series Lesson #3 What was first c

Data from our man Zipf Zipf in brief Principles of Complex Systems Zipfian empirics Course 300,

BBNANG243 Phonological analysis Laryngeal contrast in English consonants Zoltn G. Kiss,

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

Machine Learning for Signal Processing Supervised Representations: - PowerPoint PPT Presentation

Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1 Definitions: Variance and Covariance > 0 > 0

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Waveform Generation Fundamental part of signal processing is the signal. Within the

Sampling a Signal an analog signal together with some samples of the signal. The samples

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 29 August

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

No Financial Anything New? Disclosures Eric Wirtz, MD February 15, 2020 VA Larynx Trial

Chapter 5 Sound Propagation in the Human Vocal Tract 1 Basics can

The Relationship between Faith, Tradition &amp; Culture Series Lesson #3 What was first c

Data from our man Zipf Zipf in brief Principles of Complex Systems Zipfian empirics Course 300,

BBNANG243 Phonological analysis Laryngeal contrast in English consonants Zoltn G. Kiss,

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

Dual-Channel Acoustic Detection of X. Niu &amp; J. van Santen Nasalization Statuses

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

The Relationship between Faith, Tradition & Culture Series Lesson #3 What was first c

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses