Stationarity DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science https://cims.nyu.edu/~cfgranda/pages/MTDS_spring20/index.html Carlos Fernandez-Granda
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Motivation Goal: Estimate signal y ∈ R N from noisy data x ∈ R N Regression problem Optimal estimator? Linear estimator?
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Circular translation We focus on circular translations that wrap around We denote by x ↓ s the s th circular translation of a vector x ∈ C N For all 0 ≤ j ≤ N − 1, x ↓ s [ j ] = x [( j − s ) mod N ]
Effect of shift on sinusoids Shifting a sinusoid modifies its phase � i 2 π k ( l − s ) � ψ ↓ s k [ l ] = exp N � � − i 2 π ks = exp ψ k [ l ] N
Effect of translation in Fourier domain Let x ∈ C N with DFT ˆ x and y := x ↓ s y [ k ] := � x ↓ s , ψ k � ˆ = � x , ψ ↓− s � k � � i 2 π ks � � = x , exp ψ k N � � − i 2 π ks = exp � x , ψ k � N � � − i 2 π ks = exp x [ k ] ˆ N
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Linear translation-invariant (LTI) function A function F from C N to C N is linear if for any x , y ∈ C N and any α ∈ C F ( x + y ) = F ( x ) + F ( y ) , F ( α x ) = α F ( x ) , and translation invariant if for any shift 0 ≤ s ≤ N − 1 F ( x ↓ s ) = F ( x ) ↓ s
Parametrizing a linear function Let e j be the j th standard vector ( e j [ j ] = 1 and e j [ k ] = 0 for k � = j ) Let F L : C N → C N be a linear function N − 1 � F L ( x ) = F L x [ j ] e j j = 0 N − 1 � = x [ j ] F L ( e j ) j = 0 � � = F L ( e 0 ) F L ( e 1 ) · · · F L ( e N − 1 ) x = Mx
Parametrizing an LTI function Let F : C N → C N be linear and translation invariant N − 1 � F L ( x ) = F x [ j ] e j j = 0 N − 1 � = x [ j ] F ( e j ) j = 0 N − 1 � � � e ↓ j = x [ j ] F 0 j = 0 N − 1 � x [ j ] F ( e 0 ) ↓ j = j = 0
Impulse response Standard basis vectors can be interpreted as impulses LTI are characterized by their impulse response h F := F ( e 0 )
Circular convolution The circular convolution between two vectors x , y ∈ C N is defined as N − 1 � x [ s ] y ↓ s [ j ] , x ∗ y [ j ] := 0 ≤ j ≤ N − 1 s = 0
Convolution example: x 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40
Convolution example: y 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40
Convolution example: x ∗ y 10 8 6 4 2 0 40 20 0 20 40
Circular convolution The 2D circular convolution between X ∈ C N × N and Y ∈ C N × N is N − 1 N − 1 � � X [ s 1 , s 2 ] Y ↓ ( s 1 , s 2 ) [ j 1 , j 2 ] , X ∗ Y [ j 1 , j 2 ] := 0 ≤ j 1 , j 2 ≤ N − 1 s 1 = 0 s 2 = 0
Convolution example: x 0.8 0.6 0.4 0.2
Convolution example: y 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0
Convolution example: x ∗ y 90 80 70 60 50 40 30
LTI functions as convolution with impulse response For any LTI function F : C N → C N and any x ∈ C N N − 1 � x [ j ] F ( e 0 ) ↓ j F ( x ) = j = 0 = x ∗ h F For any 2D LTI function F : C N × N → C N × N and any X ∈ C N × N F ( X ) = X ∗ H F
Convolution in time is multiplication in frequency Let y := x 1 ∗ x 2 , x 1 , x 2 ∈ C N . Then ˆ y [ k ] = ˆ x 1 [ k ] ˆ x 2 [ k ] , 0 ≤ k ≤ N − 1
Convolution in time is multiplication in frequency Let Y := X 1 ∗ X 2 for X 1 , X 2 ∈ C N × N . Then Y [ k 1 , k 2 ] = � � X 1 [ k 1 , k 2 ] � X 2 [ k 1 , k 2 ]
Proof y [ k ] := � x 1 ∗ x 2 , ψ k � ˆ � N − 1 � � x 1 [ s ] x ↓ s = 2 , ψ k s = 0 � N − 1 � � � N − 1 � � x 1 [ s ] 1 − i 2 π js = exp ˆ x 2 [ j ] ψ j , ψ k N N s = 0 j = 0 � � N − 1 N − 1 � � x 2 [ j ] 1 − i 2 π js = ˆ N � ψ j , ψ k � x 1 [ s ] exp N s = 0 j = 0 N − 1 � x 2 [ j ] 1 = x 1 [ j ]ˆ ˆ N � ψ j , ψ k � j = 0 = ˆ x 1 [ k ]ˆ x 2 [ k ]
x 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40
x ˆ 20 15 10 5 0 5 40 20 0 20 40
y 1.0 0.8 0.6 0.4 0.2 0.0 40 20 0 20 40
y ˆ 10 8 6 4 2 0 2 40 20 0 20 40
x ◦ ˆ ˆ y 200 150 100 50 0 40 20 0 20 40
x ∗ y 10 8 6 4 2 0 40 20 0 20 40
X 0.8 0.6 0.4 0.2
� X 10 4 10 3 10 2 10 1 10 0 10 1
Y 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0
� Y 10 2 10 0 10 2 10 4 10 6 10 8 10 10 10 12 14 10
X ◦ � � Y 10 5 10 2 10 1 4 10 10 7 10 10 13 10
X ∗ Y 90 80 70 60 50 40 30
Convolution in time is multiplication in frequency LTI functions just scale Fourier coefficients! DFT of impulse response is the transfer function of the function For any LTI function F and any x ∈ C N N − 1 � ˆ F ( x ) = h F [ k ]ˆ x [ k ] ψ k . k = 0 For any 2D LTI function F and any X ∈ C N × N N − 1 � � N H F [ k 1 , k 2 ] � ˆ F ( X ) = X [ k 1 , k 2 ]Φ k 1 , k 2 k 1 = 0 k 2 = 1
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Signal with translation-invariant statistics
Sample covariance matrix 0.5 0.4 0.3 0.2 0.1
Eigenvalues 10 1 Eigenvalues 10 0 10 1 0 20 40 60 80 100
Principal directions 1 2 3 4 5 6 7 8 9 10
Principal directions 15 20 25 30 40 50 60 70 80 90
Stationary signals ˜ x is wide-sense or weak-sense stationary if 1. it has a constant mean E (˜ x [ j ]) = µ, 1 ≤ j ≤ N 2. there is a function a ˜ x such that E (˜ x [ j 1 ]˜ x [ j 2 ]) = ac ˜ x ( j 2 − j 1 mod N ) , 0 ≤ j 1 , j 2 ≤ N − 1 i.e. it has translation-invariant covariance
Autocovariance ac ˜ x is the autocovariance of ˜ x For any j , ac ˜ x ( j ) = ac ˜ x ( − j ) = ac ˜ x ( N − j ) ac ˜ x ( 0 ) ac ˜ x ( N − 1 ) · · · ac ˜ x ( 1 ) ac ˜ x ( 1 ) ac ˜ x ( 0 ) · · · ac ˜ x ( 2 ) Σ ˜ x = · · · ac ˜ x ( N − 1 ) ac ˜ x ( N − 2 ) · · · ac ˜ x ( 0 ) � � a ↓ 1 a ↓ 2 · · · a ↓ N − 1 = a ˜ x ˜ ˜ ˜ x x x where ac ˜ x ( 0 ) ac ˜ x ( 1 ) a ˜ x := ac ˜ x ( 2 ) · · ·
Circulant matrix Each column vector is a unit circular shift of previous column a d c b b a d c c b a d d c b a
Sample covariance matrix 0.5 0.4 0.3 0.2 0.1
Eigendecomposition of circulant matrix Any circulant matrix C ∈ C N × N can be written as C := 1 N F ∗ [ N ] Λ F [ N ] where F [ N ] is the DFT matrix and Λ is a diagonal matrix
Proof For any vector x ∈ C N Cx = c ∗ x = 1 N F ∗ [ N ] diag(ˆ c ) F [ N ] x
Eigendecomposition of circulant covariance matrix A valid eigendecomposition is given by 1 c ) 1 F ∗ √ [ N ] diag(ˆ √ F [ N ] N N If ˆ c have different values, singular vectors are sinusoids!
PCA on stationary vector Let ˜ x be wide-sense stationary with autocovariance vector a ˜ x The eigendecomposition of the covariance matrix of ˜ x equals x = 1 N F ∗ diag(ˆ Σ ˜ x ) F a ˜
CIFAR-10 images
Rows of covariance matrix 1 4 8 12
Rows of covariance matrix 15 16 17 18
Rows of covariance matrix 20 24 28 32
Principal directions 1 2 3 4 5 6 7 8 9 10
Principal directions 15 20 25 30 40 50 60 70 80 90
Principal directions 100 150 100 250 300 400 500 600 800 1000
PCA of natural images Principal directions tend to be sinusoidal This suggests using 2D sinusoids for dimensionality reduction JPEG compresses images using discrete cosine transform (DCT): 1. Image is divided into 8 × 8 patches 2. Each DCT band is quantized differently (more bits for lower frequencies)
DCT basis vectors
Projection of each 8x8 block onto first DCT coefficients 1 5 15 30 50
Stationarity Translation Linear translation-invariant models Stationary signals and PCA Wiener filtering
Signal estimation Goal: Estimate N -dimensional signal from N -dimensional data Minimum MSE estimator is conditional mean (usually intractable) Linear minimum MSE estimator?
Linear MMSE Let ˜ y and ˜ x be N -dimensional zero-mean random vectors If Σ ˜ x is full rank, then �� � � � � 2 � � � � y − B T ˜ Σ − 1 x Σ ˜ y := arg min B E � ˜ x � � � x ˜ ˜ 2 � y T � Σ ˜ y := E ˜ x ˜ x ˜
Proof The cost function can be decomposed into �� � �� � 2 � � � � � n 2 � � � � y − B T ˜ y [ j ] − B T E � � ˜ x � � = E ˜ j ˜ x 2 j = 1 Each one is a linear regression problem with optimal estimator � x T B j ) 2 � Σ − 1 x (Σ ˜ y ) j = arg min (˜ y [ j ] − ˜ B j E x ˜ ˜ where (Σ ˜ y ) j is the j th column of Σ ˜ x ˜ x ˜ y
Recommend
More recommend