Processing Independent Component Analysis Class 8. 23 Sep 2013 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Independent Component Analysis Class 8. 23 Sep 2013 Instructor: Bhiksha Raj 23 Sep 2013 11755/18797 1

Correlation vs. Causation • The consumption of burgers has gone up steadily in the past decade • In the same period, the penguin population of Antarctica has gone down 23 Sep 2013 11755/18797 2

The concept of correlation • Two variables are correlated if knowing the value of one gives you information about the expected value of the other Penguin population Burger consumption Time 23 Sep 2013 11755/18797 3

The statistical concept of correlatedness • Two variables X and Y are correlated if If knowing X gives you an expected value of Y • X and Y are uncorrelated if knowing X tells you nothing about the expected value of Y – Although it could give you other information – How? 23 Sep 2013 11755/18797 4

A brief review of basic probability • Uncorrelated: Two random variables X and Y are uncorrelated iff: – The average value of the product of the variables equals the product of their individual averages • Setup: Each draw produces one instance of X and one instance of Y – I.e one instance of (X,Y) • E[XY] = E[X]E[Y] • The average value of X is the same regardless of the value of Y 23 Sep 2013 11755/18797 5

Uncorrelatedness • Which of the above represent uncorrelated RVs? 23 Sep 2013 11755/18797 6

The statistical concept of Independence • Two variables X and Y are dependent if If knowing X gives you any information about Y • X and Y are independent if knowing X tells you nothing at all of Y 23 Sep 2013 11755/18797 7

A brief review of basic probability • Independence : Two random variables X and Y are independent iff: – Their joint probability equals the product of their individual probabilities • P(X,Y) = P(X)P(Y) • Independence implies uncorrelatedness – The average value of X is the same regardless of the value of Y • E[X|Y] = E[X] – But not the other way 23 Sep 2013 11755/18797 8

A brief review of basic probability • Independence: Two random variables X and Y are independent iff: • The average value of any function of X is the same regardless of the value of Y – Or any function of Y • E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g() 23 Sep 2013 11755/18797 9

Independence • Which of the above represent independent RVs? • Which represent uncorrelated RVs? 23 Sep 2013 11755/18797 10

A brief review of basic probability p(x) y = f(x) • The expected value of an odd function of an RV is 0 if – The RV is 0 mean – The PDF is of the RV is symmetric around 0 • E[f(X)] = 0 if f(X) is odd symmetric 23 Sep 2013 11755/18797 11

A brief review of basic info. theory T(all), M(ed), S(hort )…    ( ) ( )[ log ( )] H X P X P X X • Entropy: The minimum average number of bits to transmit to convey a symbol X T, M, S… M F F M..  Y   ( , ) ( , )[ log ( , )] H X Y P X Y P X Y , X Y • Joint entropy: The minimum average number of bits to convey sets (pairs here) of symbols 23 Sep 2013 11755/18797 12

A brief review of basic info. theory X T, M, S… M F F M.. Y        ( | ) ( ) ( | )[ log ( | )] ( , )[ log ( | )] H X Y P Y P X Y P X Y P X Y P X Y , Y X X Y • Conditional Entropy: The minimum average number of bits to transmit to convey a symbol X, after symbol Y has already been conveyed – Averaged over all values of X and Y 23 Sep 2013 11755/18797 13

A brief review of basic info. theory          ( | ) ( ) ( | )[ log ( | )] ( ) ( )[ log ( )] ( ) H X Y P Y P X Y P X Y P Y P X P X H X Y X Y X • Conditional entropy of X = H(X) if X is independent of Y       ( , ) ( , )[ log ( , )] ( , )[ log ( ) ( )] H X Y P X Y P X Y P X Y P X P Y , , X Y X Y        ( , ) log ( ) ( , ) log ( ) ( ) ( ) P X Y P X P X Y P Y H X H Y , , X Y X Y • Joint entropy of X and Y is the sum of the entropies of X and Y if they are independent 23 Sep 2013 11755/18797 14

Onward.. 23 Sep 2013 11755/18797 15

Projection: multiple notes M = W =  P = W ( W T W ) -1 W T  Projected Spectrogram = PM 23 Sep 2013 11755/18797 16

We’re actually computing a score M = H = ? W =  M ~ WH  H = pinv (W)M 23 Sep 2013 11755/18797 17

So what are we doing here? H = ? W = ? • M ~ WH is an approximation • Given W , estimate H to minimize error    2     2 arg min || || arg min ( ) H M W H M W H F ij ij H H i j • Must ideally find transcription of given notes 23 Sep 2013 11755/18797 18

How about the other way? M = H = ? ? U = W =  M ~ WH W = M pinv (H) U = WH 23 Sep 2013 11755/18797 19

Going the other way.. H W =? ? • M ~ WH is an approximation • Given H , estimate W to minimize error    2     2 arg min || || arg min ( ) W M W H M W H F ij ij W H i j • Must ideally find the notes corresponding to the transcription 23 Sep 2013 11755/18797 20

When both parameters are unknown H = ? W =? approx(M) = ? • Must estimate both H and W to best approximate M • Ideally, must learn both the notes and their transcription! 23 Sep 2013 11755/18797 21

A least squares solution   2 , arg min || || W H M W H , F W H • Unconstrained – For any W , H that minimizes the error, W’=WA, H’=A -1 H also minimizes the error for any invertible A H • Too many solutions 23 Sep 2013 11755/18797 22

A constrained least squares solution   2 , arg min || || W H M W H , F W H H • For our problem, lets consider the “truth”.. • When one note occurs, the other does not – h i T h j = 0 for all i != j • The rows of H are uncorrelated

A least squares solution H • Assume: HH T = I – Normalizing all rows of H to length 1 • pinv (H) = H T • Projecting M onto H – W = M pinv (H) = MH T – WH = M H T H   2 , arg min || || W H M W H , F W H   2 T H arg min || || H M M H Constraint: Rank(H) = 4 F H

Finding the notes • Add the constraint: HH T = I       2 T T arg min || || H M M H H H H F H • The solution is obtained through Eigen decomposition  H  T ( M ) Correlatio n H • Note: we are considering the correlation of M T 23 Oct 2012 11755/18797 26

So how does that work? • There are 12 notes in the segment, hence we try to estimate 12 notes.. 23 Oct 2012 11755/18797 28

So how does that work? • The scores of the first three “notes” and their contributions 23 Oct 2012 11755/18797 29

Finding the notes • Can find W instead of H   2 T arg min || || W M W W M F W • Assume the columns of W are orthogonal • This results in the more conventional Eigen decomposition  W  ( M ) Correlatio n W 23 Oct 2012 11755/18797 30

So how does that work? • There are 12 notes in the segment, hence we try to estimate 12 notes.. • Results are not good again 23 Oct 2012 11755/18797 31

Our notes are not orthogonal • Overlapping frequencies • Note occur concurrently – Harmonica continues to resonate to previous note • More generally, simple orthogonality will not give us the desired solution 23 Sep 2013 11755/18797 32

Eigendecomposition and SVD M  M  T WH USV • Matrix M can be decomposed as M = USV T • When we assume the scores are orthogonal, we get H = V T , W = US • When we assume the notes are orthogonal, we get W = U, H = SV T • In either case the results are the same – The notes are orthogonal and so are the scores – Not good in our problem 23 Sep 2013 11755/18797 34

Orthogonality M  WH • In any least-squared error decomposition M=WH , if the columns of W are orthogonal, the rows of H will also be orthogonal • Sometimes mere orthogonality is not enough 23 Sep 2013 11755/18797 35

What else can we look for? • Assume: The “transcription” of one note does not depend on what else is playing – Or, in a multi-instrument piece, instruments are playing independently of one another • Not strictly true, but still.. 23 Sep 2013 11755/18797 36

Formulating it with Independence     2 , arg min || || ( . . . . ) W H M W H rows of H are independen t , F W H • Impose statistical independence constraints on decomposition 23 Sep 2013 11755/18797 37

Changing problems for a bit ( ) h 1 t   ( ) ( ) ( ) m t w h t w h t 1 11 1 12 2   ( ) ( ) ( ) m t w h t w h t 2 21 1 22 2 ( ) h 2 t • Two people speak simultaneously • Recorded by two microphones • Each recorded signal is a mixture of both signals 23 Sep 2013 11755/18797 38

Processing Independent Component Analysis Class 8. 23 Sep 2013 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Independent Component Analysis Class 8. 23 Sep 2013 Instructor: Bhiksha Raj 23 Sep 2013 11755/18797 1 Correlation vs. Causation The consumption of burgers has gone up steadily in the past decade

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Cryosat Processing Prototype Cryosat Processing Prototype (CPP) (CPP) CRYOSAT LRM, TRK and SAR

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R& D

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Data Visualization in Python 1 / 16 Data Visualization Data graphics visually display measured

Showcasing Distinguished Texas A&M researchers in Material Science Faculty Research Areas

Motion capture A technology of recording the movement of real things -- usually humans

Mtrologie et expertise de la performance sportive (slides in English) Conference Paper May

Classes: From Use to Implementation Weve used several classes, a class is a collection of

1: Getting Personal Economic and wage stagnation Within the home Rapid rise in divorce

Making the Computer Personal: Making the Computer Personal: Reconstructing Domesticity for the

MOOCs Betters Education and Teaching- t he Case of Shanghai Jiao Tong University and its CnMOOC Dr

Processing Independent Component Analysis Class 8. 23 Sep 2013 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Independent Component Analysis Class 8. 23 Sep 2013 Instructor: Bhiksha Raj 23 Sep 2013 11755/18797 1 Correlation vs. Causation The consumption of burgers has gone up steadily in the past decade

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Cryosat Processing Prototype Cryosat Processing Prototype (CPP) (CPP) CRYOSAT LRM, TRK and SAR

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R&amp; D

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Traditional Processing Pipeline Roman Kern &lt;rkern@tugraz.at&gt;

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Data Visualization in Python 1 / 16 Data Visualization Data graphics visually display measured

Showcasing Distinguished Texas A&amp;M researchers in Material Science Faculty Research Areas

Motion capture A technology of recording the movement of real things -- usually humans

Mtrologie et expertise de la performance sportive (slides in English) Conference Paper May

Classes: From Use to Implementation Weve used several classes, a class is a collection of

1: Getting Personal Economic and wage stagnation Within the home Rapid rise in divorce

Making the Computer Personal: Making the Computer Personal: Reconstructing Domesticity for the

MOOCs Betters Education and Teaching- t he Case of Shanghai Jiao Tong University and its CnMOOC Dr

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R& D

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Showcasing Distinguished Texas A&M researchers in Material Science Faculty Research Areas