stationarity
play

Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data - PowerPoint PPT Presentation

Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda Stationarity Translation Linear translation-invariant models Stationary signals


  1. Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda

  2. Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

  3. Motivation Goal: Estimate signal y ∈ R N from noisy data x ∈ R N Regression problem Optimal estimator? Linear estimator?

  4. Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

  5. Circular translation We focus on circular translations that wrap around We denote by x ↓ s the s th circular translation of a vector x ∈ C N For all 0 ≤ j ≤ N − 1, x ↓ s [ j ] = x [( j − s ) mod N ]

  6. Effect of shift on sinusoids Shifting a sinusoid modifies its phase � i 2 π k ( l − s ) � ψ ↓ s k [ l ] = exp N � � − i 2 π ks = exp ψ k [ l ] N

  7. Effect of translation in Fourier domain Let x ∈ C N with DFT ˆ x and y := x ↓ s y [ k ] := � x ↓ s , ψ k � ˆ = � x , ψ ↓− s � k � � i 2 π ks � � = x , exp ψ k N � � − i 2 π ks = exp � x , ψ k � N � � − i 2 π ks = exp x [ k ] ˆ N

  8. Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

  9. Linear translation-invariant (LTI) function A function F from C N to C N is linear if for any x , y ∈ C N and any α ∈ C F ( x + y ) = F ( x ) + F ( y ) , F ( α x ) = α F ( x ) , and translation invariant if for any shift 0 ≤ s ≤ N − 1 F ( x ↓ s ) = F ( x ) ↓ s

  10. Parametrizing a linear function Let e j be the j th standard vector ( e j [ j ] = 1 and e j [ k ] = 0 for k � = j ) Let F L : C N → C N be a linear function   N − 1 �   F L ( x ) = F L x [ j ] e j j = 0 N − 1 � = x [ j ] F L ( e j ) j = 0 � � = F L ( e 0 ) F L ( e 1 ) · · · F L ( e N − 1 ) x = Mx

  11. Parametrizing an LTI function Let F : C N → C N be linear and translation invariant   N − 1 �   F L ( x ) = F x [ j ] e j j = 0 N − 1 � = x [ j ] F ( e j ) j = 0 N − 1 � � � e ↓ j = x [ j ] F 0 j = 0 N − 1 � x [ j ] F ( e 0 ) ↓ j = j = 0

  12. Impulse response Standard basis vectors can be interpreted as impulses LTI are characterized by their impulse response h F := F ( e 0 )

  13. Circular convolution The circular convolution between two vectors x , y ∈ C N is defined as N − 1 � x [ s ] y ↓ s [ j ] , x ∗ y [ j ] := 0 ≤ j ≤ N − 1 s = 0

  14. Convolution example: x 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40

  15. Convolution example: y 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40

  16. Convolution example: x ∗ y 10 8 6 4 2 0 40 20 0 20 40

  17. Circular convolution The 2D circular convolution between X ∈ C N × N and Y ∈ C N × N is N − 1 N − 1 � � X [ s 1 , s 2 ] Y ↓ ( s 1 , s 2 ) [ j 1 , j 2 ] , X ∗ Y [ j 1 , j 2 ] := 0 ≤ j 1 , j 2 ≤ N − 1 s 1 = 0 s 2 = 0

  18. Convolution example: x 0.8 0.6 0.4 0.2

  19. Convolution example: y 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

  20. Convolution example: x ∗ y 90 80 70 60 50 40 30

  21. LTI functions as convolution with impulse response For any LTI function F : C N → C N and any x ∈ C N N − 1 � x [ j ] F ( e 0 ) ↓ j F ( x ) = j = 0 = x ∗ h F For any 2D LTI function F : C N × N → C N × N and any X ∈ C N × N F ( X ) = X ∗ H F

  22. Convolution in time is multiplication in frequency Let y := x 1 ∗ x 2 , x 1 , x 2 ∈ C N . Then ˆ y [ k ] = ˆ x 1 [ k ] ˆ x 2 [ k ] , 0 ≤ k ≤ N − 1

  23. Convolution in time is multiplication in frequency Let Y := X 1 ∗ X 2 for X 1 , X 2 ∈ C N × N . Then Y [ k 1 , k 2 ] = � � X 1 [ k 1 , k 2 ] � X 2 [ k 1 , k 2 ]

  24. Proof y [ k ] := � x 1 ∗ x 2 , ψ k � ˆ � N − 1 � � x 1 [ s ] x ↓ s = 2 , ψ k s = 0 � N − 1 � � � N − 1 � � x 1 [ s ] 1 − i 2 π js = exp ˆ x 2 [ j ] ψ j , ψ k N N s = 0 j = 0 � � N − 1 N − 1 � � x 2 [ j ] 1 − i 2 π js = ˆ N � ψ j , ψ k � x 1 [ s ] exp N s = 0 j = 0 N − 1 � x 2 [ j ] 1 = x 1 [ j ]ˆ ˆ N � ψ j , ψ k � j = 0 = ˆ x 1 [ k ]ˆ x 2 [ k ]

  25. x 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40

  26. x ˆ 20 15 10 5 0 5 40 20 0 20 40

  27. y 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40

  28. y ˆ 10 8 6 4 2 0 2 40 20 0 20 40

  29. x ◦ ˆ ˆ y 200 150 100 50 0 40 20 0 20 40

  30. x ∗ y 10 8 6 4 2 0 40 20 0 20 40

  31. X 0.8 0.6 0.4 0.2

  32. � X 10 4 10 3 10 2 10 1 10 0 10 1

  33. Y 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

  34. � Y 10 2 10 0 10 2 10 4 10 6 10 8 10 10 10 12 14 10

  35. X ◦ � � Y 10 5 10 2 10 1 4 10 10 7 10 10 13 10

  36. X ∗ Y 90 80 70 60 50 40 30

  37. Convolution in time is multiplication in frequency LTI functions just scale Fourier coefficients! DFT of impulse response is the transfer function of the function For any LTI function F and any x ∈ C N N − 1 � ˆ F ( x ) = h F [ k ]ˆ x [ k ] ψ k . k = 0 For any 2D LTI function F and any X ∈ C N × N N − 1 � � N H F [ k 1 , k 2 ] � ˆ F ( X ) = X [ k 1 , k 2 ]Φ k 1 , k 2 k 1 = 0 k 2 = 1

  38. Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

  39. Signal with translation-invariant statistics

  40. Sample covariance matrix 0.5 0.4 0.3 0.2 0.1

  41. Eigenvalues 10 1 Eigenvalues 10 0 10 1 0 20 40 60 80 100

  42. Principal directions 1 2 3 4 5 6 7 8 9 10

  43. Principal directions 15 20 25 30 40 50 60 70 80 90

  44. Stationary signals ˜ x is wide-sense or weak-sense stationary if 1. it has a constant mean E (˜ x [ j ]) = µ, 1 ≤ j ≤ N 2. there is a function a ˜ x such that E (˜ x [ j 1 ]˜ x [ j 2 ]) = ac ˜ x ( j 2 − j 1 mod N ) , 0 ≤ j 1 , j 2 ≤ N − 1 i.e. it has translation-invariant covariance

  45. Autocovariance ac ˜ x is the autocovariance of ˜ x For any j , ac ˜ x ( j ) = ac ˜ x ( − j ) = ac ˜ x ( N − j )   ac ˜ x ( 0 ) ac ˜ x ( N − 1 ) · · · ac ˜ x ( 1 )   ac ˜ x ( 1 ) ac ˜ x ( 0 ) · · · ac ˜ x ( 2 )   Σ ˜ x =   · · · ac ˜ x ( N − 1 ) ac ˜ x ( N − 2 ) · · · ac ˜ x ( 0 ) � � a ↓ 1 a ↓ 2 · · · a ↓ N − 1 = a ˜ x ˜ ˜ ˜ x x x where   ac ˜ x ( 0 )   ac ˜ x ( 1 )   a ˜ x :=   ac ˜ x ( 2 ) · · ·

  46. Circulant matrix Each column vector is a unit circular shift of previous column   a d c b   b a d c     c b a d d c b a

  47. Sample covariance matrix 0.5 0.4 0.3 0.2 0.1

  48. Eigendecomposition of circulant matrix Any circulant matrix C ∈ C N × N can be written as C := 1 N F ∗ [ N ] Λ F [ N ] where F [ N ] is the DFT matrix and Λ is a diagonal matrix

  49. Proof For any vector x ∈ C N Cx = c ∗ x = 1 N F ∗ [ N ] diag(ˆ c ) F [ N ] x

  50. Eigendecomposition of circulant covariance matrix A valid eigendecomposition is given by 1 c ) 1 F ∗ √ [ N ] diag(ˆ √ F [ N ] N N If ˆ c have different values, singular vectors are sinusoids!

  51. PCA on stationary vector Let ˜ x be wide-sense stationary with autocovariance vector a ˜ x The eigendecomposition of the covariance matrix of ˜ x equals x = 1 N F ∗ diag(ˆ Σ ˜ x ) F a ˜

  52. CIFAR-10 images

  53. Rows of covariance matrix 1 4 8 12

  54. Rows of covariance matrix 15 16 17 18

  55. Rows of covariance matrix 20 24 28 32

  56. Principal directions 1 2 3 4 5 6 7 8 9 10

  57. Principal directions 15 20 25 30 40 50 60 70 80 90

  58. Principal directions 100 150 100 250 300 400 500 600 800 1000

  59. PCA of natural images Principal directions tend to be sinusoidal This suggests using 2D sinusoids for dimensionality reduction JPEG compresses images using discrete cosine transform (DCT): 1. Image is divided into 8 × 8 patches 2. Each DCT band is quantized differently (more bits for lower frequencies)

  60. DCT basis vectors

  61. Projection of each 8x8 block onto first DCT coefficients 1 5 15 30 50

  62. Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering

  63. Signal estimation Goal: Estimate N -dimensional signal from N -dimensional data Minimum MSE estimator is conditional mean (usually intractable) Linear minimum MSE estimator?

  64. Linear MMSE Let ˜ y and ˜ x be N -dimensional zero-mean random vectors If Σ ˜ x is full rank, then �� � � � � 2 � � � � y − B T ˜ Σ − 1 x Σ ˜ y := arg min B E � ˜ x � � � x ˜ ˜ 2 � y T � Σ ˜ y := E ˜ x ˜ x ˜

  65. Proof The cost function can be decomposed into �� � �� � 2 � � � � � n 2 � � � � y − B T ˜ y [ j ] − B T E � � ˜ x � � = E ˜ j ˜ x 2 j = 1 Each one is a linear regression problem with optimal estimator � x T B j ) 2 � Σ − 1 x (Σ ˜ y ) j = arg min (˜ y [ j ] − ˜ B j E x ˜ ˜ where (Σ ˜ y ) j is the j th column of Σ ˜ x ˜ x ˜ y

Recommend


More recommend