Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1
Matrix Identities df dx 1 dx 1 x 1 df x dx x x f ( ) x 2 df ( ) 2 dx ... 2 x ... D df dx D dx D • The derivative of a scalar function w.r.t. a vector is a vector 17 Oct 2013 11755/18797 2
Matrix Identities df df df dx dx dx 11 12 1 D dx dx dx .. x x .. x 11 12 1 D 11 12 1 D df df df .. x x .. x dx dx dx x x x f ( ) df ( ) 21 22 2 D 21 22 2 D dx dx dx 21 22 .. 2 D .. .. .. .. .. .. .. x x .. x .. df df df D 1 D 2 DD dx dx dx D 1 D 2 DD dx dx dx D 1 D 2 DD • The derivative of a scalar function w.r.t. a vector is a vector • The derivative w.r.t. a matrix is a matrix 17 Oct 2013 11755/18797 3
Matrix Identities dF dF dF 1 1 1 dx dx dx 1 2 D dx dx dx 1 2 D dF F x .. dF dF dF 1 1 1 2 2 2 dx dx dx dF .. F x 1 2 D 2 dx dx dx F ( x ) F x 2 2 .. ... 1 2 D ... ... .. .. .. .. dF F x N dF dF dF N D N N N dx dx dx 1 2 D dx dx dx 1 2 D • The derivative of a vector function w.r.t. a vector is a matrix – Note transposition of order 17 Oct 2013 11755/18797 4
Derivatives , , UxV Nx1 UxV NxUxV Nx1 UxVxN • In general: Differentiating an MxN function by a UxV argument results in an MxNxUxV tensor derivative 17 Oct 2013 11755/18797 5
Matrix derivative identities X is a matrix, a is a vector. T T d ( Xa ) X d a d ( a X ) X d a Solution may also be X T d ( AX ) ( d A ) X ; d ( XA ) X ( d A ) A is a matrix a T T T d a Xa a X X d T T T T d trace A XA d trace XAA d trace AA X ( X X ) d A • Some basic linear and quadratic identities 17 Oct 2013 11755/18797 6
A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7
How to fix this problem? • “Glitches” in audio – Must be detected – How? • Then what? • Glitches must be “fixed” – Delete the glitch • Results in a “hole” – Fill in the hole – How? 17 Oct 2013 11755/18797 8
Interpolation.. • “Extend” the curve on the left to “predict” the values in the “blank” region – Forward prediction • Extend the blue curve on the right leftwards to predict the blank region – Backward prediction • How? – Regression analysis.. 17 Oct 2013 11755/18797 9
Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere • Reconstructed value will not match actual value • Large error of reconstruction identifies glitches 17 Oct 2013 11755/18797 10
What is a regression • Analyzing relationship between variables • Expressed in many forms • Wikipedia – Linear regression, Simple regression, Ordinary least squares, Polynomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, …. • Generally a tool to predict variables 17 Oct 2013 11755/18797 11
Regressions for prediction • y = f( x ; Q ) + e • Different possibilities – y is a scalar • y is real • y is categorical (classification) – y is a vector – x is a vector • x is a set of real valued variables • x is a set of categorical variables • x is a combination of the two – f( . ) is a linear or affine function – f( . ) is a non-linear function – f( . ) is a time-series model 17 Oct 2013 11755/18797 12
A linear regression Y X • Assumption: relationship between variables is linear – A linear trend may be found relating x and y – y = dependent variable – x = explanatory variable – Given x , y can be predicted as an affine function of x 17 Oct 2013 11755/18797 13
An imaginary regression.. • http://pages.cs.wisc.edu/~kovar/hall.html • Check this shit out (Fig. 1). That's bonafide, 100%-real data, my friends. I took it myself over the course of two weeks. And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best data possible. Now, let's look a bit more closely at this data, remembering that it is absolutely first-rate. Do you see the exponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of my time. Banking on my hopes that whoever grades this will just look at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computer program to make the fit. I understand this is the same process by which the top quark was discovered. 17 Oct 2013 11755/18797 14
Linear Regressions • y = Ax + b + e – e = prediction error • Given a “training” set of { x, y } values: estimate A and b – y 1 = Ax 1 + b + e 1 – y 2 = Ax 2 + b + e 2 – y 3 = Ax 3 + b + e 3 – … • If A and b are well estimated, prediction error will be small 17 Oct 2013 11755/18797 15
Linear Regression to a scalar y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3 Define: y [ y y y ...] x x x a b A 1 2 3 1 2 3 ... X 1 1 1 e [ e e e ...] 1 2 3 • Rewrite T y A X e 17 Oct 2013 11755/18797 16
Learning the parameters T y A X e ˆ T y A X Assuming no error • Given training data: several x , y ˆ • Can define a “divergence”: D( y , ) y ˆ – Measures how much differs from y y – Ideally, if the model is accurate this should be small ˆ • Estimate A , b to minimize D( y , ) y 17 Oct 2013 11755/18797 17
The prediction error as divergence y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3 ˆ T y A X e y e ˆ 2 2 2 D(y, y ) E e e e ... 1 2 3 T 2 T 2 T 2 ( y a x b ) ( y a x b ) ( y a x b ) ... 1 1 2 2 3 3 2 T T T T E y A X y A X y A X • Define divergence as sum of the squared error in predicting y 17 Oct 2013 11755/18797 18
Prediction error as divergence • y = a T x + e – e = prediction error – Find the “slope” a such that the total squared length of the error lines is minimized 17 Oct 2013 11755/18797 19
Solving a linear regression T y A X e • Minimize squared error T 2 T T T E || y X A || ( y A X )( y A X ) T T T T yy A XX A - 2 yX A • Differentiating w.r.t A and equating to 0 T T T d E 2 A XX - 2 yX d A 0 -1 -1 T T T T T A yX XX y pinv X A XX Xy 17 Oct 2013 11755/18797 20
Regression in multiple dimensions y 1 = A T x 1 + b + e 1 y i is a vector y 2 = A T x 2 + b + e 2 y 3 = A T x 3 + b + e 3 y ij = j th component of vector y i a i = i th column of A • Also called multiple regression b j = j th component of b • Equivalent of saying: T x i + b 1 + e i1 y i1 = a 1 T x i + b 2 + e i2 y i2 = a 2 y i = A T x i + b + e i T x i + b 3 + e i3 y i3 = a 3 • Fundamentally no different from N separate single regressions – But we can use the relationship between y s to our benefit 17 Oct 2013 11755/18797 21
Multiple Regression A x x x ˆ Y b 1 2 3 [ y y y ...] X ... A 1 2 3 1 1 1 E [ e e e ...] 1 2 3 Dx1 vector of ones ˆ T Y A X E 2 ˆ ˆ ˆ T T T T DIV y A x trace ( Y A X )( Y A X ) i i i • Differentiating and equating to 0 ˆ ˆ ˆ T T T T T d . Div 2 Y - A X X d A 0 YX A XX ˆ ˆ -1 -1 T T T T T A YX XX Y pinv X A XX XY 17 Oct 2013 11755/18797 22
A Different Perspective = + • y is a noisy reading of A T x T y A x e • Error e is Gaussian 2 I e ~ N ( 0 , ) • Estimate A from Y [ y y ... y ] X [ x x ... x ] 1 2 N 1 2 N 17 Oct 2013 11755/18797 23
Recommend
More recommend