independent component independent component analysis y
play

Independent Component Independent Component Analysis y Class 20. 8 - PowerPoint PPT Presentation

11-755 Machine Learning for Signal Processing Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha Raj 8 Nov 2012 11755/18797 1 A brief review of basic probability Uncorrelated: Two random


  1. 11-755 Machine Learning for Signal Processing Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha Raj 8 Nov 2012 11755/18797 1

  2. A brief review of basic probability  Uncorrelated: Two random variables X and Y are uncorrelated iff: uncorrelated iff:  The average value of the product of the variables equals the product of their individual averages  Setup: Each draw produces one instance of X and one instance of Y instance of Y  I.e one instance of (X,Y)  E[XY] = E[X]E[Y]  E[XY] E[X]E[Y]  The average value of X is the same regardless of the value of Y 8 Nov 2012 11755/18797 2

  3. Uncorrelatedness  Which of the above represent uncorrelated RVs? 8 Nov 2012 11755/18797 3

  4. A brief review of basic probability  Independence: Two random variables X and Y are independent iff:  Their joint probability equals the product of their individual probabilities  P(X Y) = P(X)P(Y)  P(X,Y) P(X)P(Y)   The average value of X is the same regardless of the value of Y  E[X|Y] = E[X] 8 Nov 2012 11755/18797 4

  5. A brief review of basic probability  Independence: Two random variables X and Y are independent iff:  The average value of any function X is the same The average value of any function X is the same regardless of the value of Y  E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g() E[f(X) (Y)] E[f(X)] E[ (Y)] f ll f() () 8 Nov 2012 11755/18797 5

  6. Independence  Which of the above represent independent RVs?  Which represent uncorrelated RVs? Whi h t l t d RV ? 8 Nov 2012 11755/18797 6

  7. A brief review of basic probability p(x) ( ) f(x)  The expected value of an odd function of an RV is 0 if  The RV is 0 mean  The RV is 0 mean  The PDF is of the RV is symmetric around 0  E[f(X)] = 0 if f(X) is odd symmetric E[f(X)] 0 if f(X) i dd t i 8 Nov 2012 11755/18797 7

  8. A brief review of basic info. theory T(all), M(ed), S(hort)…    ( ) ( )[ log ( )] H X P X P X X  Entropy: The minimum average number of bits to  Entropy: The minimum average number of bits to transmit to convey a symbol X X T, M, S… M F F M..  Y Y   ( , ) ( , )[ log ( , )] H X Y P X Y P X Y , X Y  Joint entropy: The minimum average number of bits to convey sets (pairs here) of symbols 8 Nov 2012 11755/18797 8

  9. A brief review of basic info. theory X X T, M, S… M F F M.. Y        ( | ) ( ) ( | )[ log ( | )] ( , )[ log ( | )] H X Y P Y P X Y P X Y P X Y P X Y , Y X X Y  Conditional Entropy: The minimum average number of bits to transmit to convey a symbol X, y y , after symbol Y has already been conveyed  Averaged over all values of X and Y  Averaged over all values of X and Y 8 Nov 2012 11755/18797 9

  10. A brief review of basic info. theory              ( | ) ( ) ( | )[ log ( | )] ( ) ( )[ log ( )] ( ) H X Y P Y P X Y P X Y P Y P X P X H X Y X Y X  Conditional entropy of X = H(X) if X is  Conditional entropy of X = H(X) if X is independent of Y         ( ( , ) ) ( ( , )[ )[ log l ( ( , )] )] ( ( , )[ )[ log l ( ( ) ) ( ( )] )] H H X X Y Y P P X X Y Y P P X X Y Y P P X X Y Y P P X X P P Y Y , , X Y X Y        ( , ) log ( ) ( , ) log ( ) ( ) ( ) P X Y P X P X Y P Y H X H Y X , Y X , Y  Joint entropy of X and Y is the sum of the entropies of X and Y if they are independent p y p 8 Nov 2012 11755/18797 10

  11. Onward.. 8 Nov 2012 11755/18797 11

  12. Projection: multiple notes j M = W =  P = W (W T W) ‐ 1 W T ( )  Projected Spectrogram = P * M 8 Nov 2012 11755/18797 12

  13. We’re actually computing a score M = H = ? W =  M ~ WH  H = pinv (W)M 8 Nov 2012 11755/18797 13

  14. How about the other way? M = H = ? ? ? ? U = U = W = W =  M ~ WH W = M pinv (V) U = WH 8 Nov 2012 11755/18797 14

  15. So what are we doing here? H = ? W = ?  M ~ WH is an approximation  Given W , estimate H to minimize error    2     2 arg min || || arg min ( ) H M W H M W H F ij ij H H i j  Must ideally find transcription of given notes 8 Nov 2012 11755/18797 15

  16. Going the other way.. H W =? ?  M ~ WH is an approximation  Given H , estimate W to minimize error    2     2 arg min || || arg min ( ) W M W H M W H F ij ij W H i j  Must ideally find the notes corresponding to the d ll f d h d h transcription 8 Nov 2012 11755/18797 16

  17. When both parameters are unknown H = ? W =? approx(M) = ? approx(M) ?  Must estimate both H and W to best approximate M  Ideally, must learn both the notes and their transcription! 8 Nov 2012 11755/18797 17

  18. A least squares solution   2 , arg min || || W H M W H , F W H  Unconstrained  For any W,H that minimizes the error, W’=WA, H’=A -1 H also minimizes the error for any invertible A also minimizes the error for any invertible A H H  For our problem, lets consider the “truth”.. For our problem, lets consider the truth ..  When one note occurs, the other does not T h j = 0 for all i != j  h i i j  The rows of H are uncorrelated 8 Nov 2012 11755/18797 18

  19. A least squares solution H  Assume: HH T = I  Normalizing all rows of H to length 1 g g  pinv (H) = H T  Projecting M onto H  Projecting M onto H  W = M pinv (H) = MH T  WH = M H T H  WH M H H   2 , arg min || || W H M W H , F W H   2 T H arg min || || H M M H Constraint: Rank(H) = 4 F H 8 Nov 2012 11755/18797 19

  20. Finding the notes   2 T H arg min || || H M M H F H  Note H T H != I  Only HH T = I  Could also be rewritten as       T T arg min ( ) H trace M I H H M H H     T T arg min ( ) H trace M M I H H H     T T arg min ( )( ) H M I H H trace Correlatio n H      T T T T arg max ( ) H trace Correlatio n M H H H 8 Nov 2012 11755/18797 20

  21. Finding the notes  Constraint: every row of H has length 1            T T T arg max ( ) H trace Correlatio n M H H trace H H H  Differentiating and equating to 0  H   T ( ( M ) M ) Correlatio Correlatio n n H H H  Simply requiring the rows of H to be orthonormal p y q g gives us that H is the set of Eigenvectors of the data in M T 8 Nov 2012 11755/18797 21

  22. Equivalences        T T T arg max ( ) H trace Correlatio n M H H trace H H H  is identical to         2 2 T , arg min || || || || W H M W H h h h , F i i ij i j W H  i i j  Minimize least squares error with the constraint that the rows of H are length 1 and orthogonal to one another 8 Nov 2012 11755/18797 22

  23. So how does that work?  There are 12 notes in the segment, hence we try to estimate 12 notes to estimate 12 notes.. 8 Nov 2012 11755/18797 23

  24. So how does that work?  The first three “notes” and their contributions  The spectrograms of the notes are statistically uncorrelated The spectrograms of the notes are statistically uncorrelated 8 Nov 2012 11755/18797 24

  25. Finding the notes  Can find W instead of H   2 2 T T arg min i || || || || W W M M W W W W M M F W  Solving the above with the constraints that the  Solving the above, with the constraints that the columns of W are orthonormal gives you the eigen vectors of the data in M eigen vectors of the data in M        T T arg max ( ) W W W M W W trace Correlatio n trace W   ( ) Correlatio n M W W 8 Nov 2012 11755/18797 25

  26. So how does that work?  There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov 2012 11755/18797 26

  27. Our notes are not orthogonal  Overlapping frequencies O l i f i  Note occur concurrently  Harmonica continues to resonate to previous note  More generally, simple orthogonality will not give us the desired solution 8 Nov 2012 11755/18797 28

  28. What else can we look for?  Assume: The “transcription” of one note does not p depend on what else is playing  Or, in a multi ‐ instrument piece, instruments are playing independently of one another  Not strictly true, but still.. 8 Nov 2012 11755/18797 29

  29. Formulating it with Independence     2 , arg min || || ( . . . . ) W H M W H rows of H are independen t , F W H  Impose statistical independence constraints on  Impose statistical independence constraints on decomposition 8 Nov 2012 11755/18797 30

Recommend


More recommend