Approximate Factor Analysis Models Lorenzo Finesso, Peter Spreij Brixen – July 19, 2007
1 0 0 1 1 0 P 1 = 2 2 1 1 0 2 2 1 1 2 0 2 1 1 P 2 = 2 0 2 0 0 1 1 1 1 2 4 4 1 1 1 P 2 P 1 = 2 4 4 1 1 0 2 2 1
Factor Analysis models Y = HX + ε where X ∈ R k and ε ∈ R n , independent zero mean normals, ( k < n ) C ov( X ) = I , and C ov( ε ) = D > 0, diagonal therefore C ov( Y ) := Σ 0 = HH ⊤ + D C ov( Y | X ) = D diagonal 2
Exact (weak) realization of FA models Problem Given the positive covariance matrix Σ 0 ∈ R n × n and the integer k < n find ( H, D ) such that H ∈ R n × k D > 0 diagonal n × n Σ 0 = HH ⊤ + D 3
Informational divergence between normal measures Given probability measures P 1 ≪ P 2 , on the same space D ( P 1 || P 2 ) = E P 1 log d P 1 d P 2 normal case on R n P 1 = N (0 , Σ 1 ) , P 2 = N (0 , Σ 2 ) D ( P 1 || P 2 ) := D (Σ 1 || Σ 2 ) = 1 2 log | Σ 2 | | Σ 1 | + 1 2 Σ 1 ) − n 2 tr(Σ − 1 2 4
Approximate FA models Problem Given Σ 0 ∈ R n × n positive and the integer k < n minimize 2 log | HH ⊤ + D | D (Σ 0 || HH ⊤ + D ) = 1 + 1 2 tr(( HH ⊤ + D ) − 1 Σ 0 ) − n | Σ 0 | 2 over ( H, D ), where H ∈ R n × k and D > 0 is diagonal of size n Proposition The approximate FA problem admits a (nonunique) solution 5
Lifted version of the problem Definitions � � � � Σ 11 Σ 12 Σ ∈ R ( n + k ) × ( n + k ) : Σ = Σ = > 0 Σ 21 Σ 22 Two subsets of Σ will play a special role Σ 0 = { Σ ∈ Σ : Σ 11 = Σ 0 } HH ⊤ + D � � �� HQ Σ 1 = Σ ∈ Σ : Σ = ( HQ ) ⊤ Q ⊤ Q Elements of Σ 1 will often be denoted by Σ( H, D, Q ) Remark Y ∼ N (0 , Σ 0 ) admits an exact FA model of size k iff Σ 0 ∩ Σ 1 � = ∅ 6
Lifted problem Problem D (Σ ′ || Σ 1 ) min Σ ′ ∈ Σ 0 , Σ 1 ∈ Σ 1 Proposition Let Σ 0 be given. It holds that H,D D (Σ 0 || HH ⊤ + D ) = D (Σ ′ || Σ 1 ) min min Σ ′ ∈ Σ 0 , Σ 1 ∈ Σ 1 7
First partial minimization Problem D (Σ ′ || Σ) min Σ ′ ∈ Σ 0 This problem has a unique solution 8
First partial minimization - general solution Proposition Let ( Y, X ) ∼ Q = Q Y,X and let P = P Y,X : P Y = P 0 � � P = for a given P 0 ≪ Q Y , then D ( P || Q ) = D ( P ∗ || Q ) min P ∈ P where P ∗ is given by P ∗ Y = P 0 , P ∗ X | Y = Q X | Y Moreover, for any P ∈ P , one has the Pythagorean law D ( P || Q ) = D ( P || P ∗ ) + D ( P ∗ || Q ) 9
First partial minimization – normal case Proposition Let Q ∼ N (0 , Σ) and P 0 ∼ N (0 , Σ 0 ) where Σ ∈ Σ and Σ 0 ∈ R n × n , then D (Σ ′ || Σ) min Σ ′ ∈ Σ 0 is attained by P ∗ ∼ N (0 , Σ ∗ ) with Σ 0 Σ − 1 Σ 0 11 Σ 12 Σ ∗ = Σ 21 Σ − 1 Σ 22 − Σ 21 Σ − 1 11 (Σ 11 − Σ 0 )Σ − 1 11 Σ 0 11 Σ 12 10
Second partial minimization Problem min D (Σ || Σ 1 ) Σ 1 ∈ Σ 1 This problem has a unique solution Σ ∗ 1 = Σ ∗ ( H ∗ , D ∗ , Q ∗ ) 11
Second partial minimization – normal case Notation For M square let ∆( M ) be the diagonal ∆( M ) ii = M ii Proposition An optimal point is ( H ∗ , D ∗ , Q ∗ ) with H ∗ = Σ 12 Σ − 1 / 2 22 D ∗ = ∆(Σ 11 − Σ 12 Σ − 1 22 Σ 21 ) Q ∗ = Σ 1 / 2 22 thus: Σ 12 Σ − 1 22 Σ 21 + ∆(Σ 11 − Σ 12 Σ − 1 � � 22 Σ 21 ) Σ 12 Σ ∗ 1 = Σ 21 Σ 22 moreover D (Σ || Σ( H, D, Q )) = D (Σ || Σ ∗ 1 ) + D (Σ ∗ 1 || Σ( H, D, Q )) for any Σ( H, D, Q ) ∈ Σ 1 12
Alternating minimization algorithm Given Σ 0 > 0, pick ( H 0 , D 0 , Q 0 ) and let Σ (0) = Σ( H 0 , D 0 , Q 0 ) 1 construct the sequence Σ (0) → Σ (1) → Σ (2) → Σ ′ (1) − → Σ ′ (2) − − − − → . . . 1 1 1 where D (Σ ′ ( t +1) || Σ ( t ) D (Σ ′ || Σ ( t ) . 1 ) = min 1 ) Σ ′ ∈ Σ 0 and D (Σ ′ ( t +1) || Σ ( t +1) D (Σ ′ ( t +1) || Σ 1 ) . ) = min 1 Σ 1 ∈ Σ 1 13
Algorithm At the t -th iteration the matrices H t , D t and Q t are available. � Q ⊤ t Q t − Q ⊤ t H ⊤ t ( H t H ⊤ t + D t ) − 1 H t Q t Q t +1 = � 1 / 2 + Q ⊤ t H ⊤ t ( H t H ⊤ t + D t ) − 1 Σ 0 ( H t H ⊤ t + D t ) − 1 H t Q t H t +1 = Σ 0 ( H t H ⊤ t + D t ) − 1 H t Q t Q − 1 t +1 D t +1 = ∆(Σ 0 − H t +1 H ⊤ t +1 ) 14
Algorithm Notice The update rules can be written in terms of ( H t , D t ) only R t = I − H ⊤ t ( H t H ⊤ t + D t ) − 1 ( H t H ⊤ t + D t − Σ 0 )( H t H ⊤ t + D t ) − 1 H t t + D t ) − 1 H t R − 1 / 2 H t +1 = Σ 0 ( H t H ⊤ t D t +1 = ∆(Σ 0 − H t +1 H ⊤ t +1 ) 15
Some properties of the algorithm Proposition (a) D t > 0 (b) R t is invertible (c) If H 0 is of full column rank, so is H t If Σ 0 = H t H ⊤ (e) t + D t the algorithm stops (f) The objective function decreases at each iteration (g) The limit points ( H, D ) of the algorithm satisfy the relations H = (Σ 0 − HH ⊤ ) D − 1 H, D = ∆(Σ 0 − HH ⊤ ) 16
Recommend
More recommend