Dictionary Learning Applications in Control Theory Paul Irofti, Florin Stoican Politehnica University of Bucharest Faculty of Automatic Control and Computers Department of Automatic Control and Systems Engineering Email: paul@irofti.net, florin.stoican@acse.pub.ro Recent Advances in Artificial Intelligence, June 20 th , 2017 Acknowledgment: This work was supported by the Romanian National Authority for Scientific Research, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-2713.
Sparse Representation (SR) = · y x D
Orthogonal Matching Pursuit (OMP) Algorithm 1: OMP a 1 Arguments: D , y , s 2 Initialize: r = y , I = ∅ 3 for k = 1 : s do Compute correlations with residual: z = D T r 4 Select new column: i = arg max j | z j | 5 Increase support: I ← I ∪ { i } 6 Compute new solution: x = LS( D , y , I ) 7 Update residual: r = y − D I x I 8 a Pati, Rezaiifar, and Krishnaprasad 1993.
Dictionary Learning (DL) ≈ · Y D X
The Dictionary Learning (DL) Problem Given a data set Y ∈ R p × m and a sparsity level s , minimize the bivariate function � Y − DX � 2 minimize F D , X (1) subject to � d j � 2 = 1 , 1 ≤ j ≤ n � x i � 0 ≤ s , 1 ≤ i ≤ m , where D ∈ R p × n is the dictionary (whose columns are called atoms) and X ∈ R n × m the sparse representations matrix.
Approach Algorithm 2: Dictionary learning – general structure 1 Arguments: signal matrix Y , target sparsity s 2 Initialize: dictionary D (with normalized atoms) 3 for k = 1 , 2 , . . . do With fixed D , compute sparse representations X 4 With fixed X , update atoms d j , j = 1 : n 5
DL Algorithms K-SVD 1 solves the optimization problem in sequence 2 � � � � � − d j X j , I j � � min Y I j − d ℓ X ℓ, I ℓ (2) � � d j , X j , I j � � ℓ � = j � � F where all atoms excepting d j are fixed. This is seen as a rank-1 approximation and the solution is given by the singular vectors corresponding to the largest singular value. d j = u 1 , X j , I j = σ 1 v 1 . (3) 1 Aharon, Elad, and Bruckstein 2006.
LC-KSVD � Y − DX � 2 F + α � Q − AX � 2 F + β � H − WX � 2 minimize F D , X , A , W (4) subject to � d j � 2 = 1 , 1 ≤ j ≤ n � x i � 0 ≤ s , 1 ≤ i ≤ m , dictionary atoms evenly split among classes q i has non-zero entries where y i and d i share the same label. linear transformation A encourages discrimination in X h i = e j where j is the class label of y i W represents the learned classifier parameters
Fault Detection and Isolation in Water Networks
FDI via DL Water networks pose some interesting issues: large scale, distributed network with few sensors user demand unknown or imprecise pressure dynamics nonlinear (analytic solutions impractical) The DL approach for FDI: a residual signal compares expected and measured pressures r i ( t ) = p i ( f i ( t ) , f j ( t ) , t ) − ¯ p i , ∀ i , j (5) to each fault is assigned a class and DL provides the atoms which discriminate between them each residual is sparsely described by atoms and thus, FDI is achieved iff the classification is unambiguous
Hanoi 100 1 junction node 1350 20 21 19 2 3 tank node 2 2 0 0 1500 500 9 0 0 400 0 5 18 node with sensor 1 4 1 2650 800 junction partition 1450 5 22 17 27 pipe connection 0 6 5 1500 4 0 5 1230 fault event 7 1 0 850 0 0 7 2 16 23 legend 28 0 5 8 8 0 0 0 3 3 7 1 12 11 10 800 2 9 24 13 14 25 1600 15 26 31 3500 1200 9 5 0 800 850 5 0 0 950 550 300 7 5 0 30 860 150 29
Sensor Placement Let R ∈ R n × mn be measured pressure residuals in all n network nodes. For each node we simulate m different faults. Given s < n available sensors, apply OMP on each column r � r − I n x � 2 minimize 2 x (6) subject to � x � 0 ≤ s , resulting in matrix X with s -sparse columns approximating R .
Placement Strategies (a) select the s most common used atoms (b) from each m -block select most frequent s atoms; of the n · s atoms, select again the first s . 10 case (a) 9 number of sensors case (b) 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 selected nodes
Learning Algorithm 3: Placement and FDI learning a 1 Inputs: training residuals R ∈ R n × nm parameters s , α , β 2 3 Result: dictionary D , classifier W , sensor nodes I s 4 Select s sensor nodes I s based on matrix R using (a) or (b) 5 Let R I s be the restriction of R to the rows in I s 6 Use R I s , α and β to learn D and W from (4) a Irofti and Stoican 2017.
Fault Detection Algorithm 4: Fault detection and isolation 1 Inputs: testing residuals R ∈ R s × mn dictionary D , classifier W 2 3 Result: prediction P ∈ N mn 4 for k = 1 to mn do Use OMP to obtain x k using r k and D 5 Label: L k = Wx k 6 Classify: p k = arg max c L k 7 Position c of the largest entry from L k is the predicted class.
Today Improved sensor placement. Iteratively choose s rows from R solving at each step 1 � proj R I r k � 2 i = arg min 2 + λ , r k ∈ R I c , (7) � δ k , I � 1 k where I is the set of currently selected rows and δ k , I vector elements are the distances from node k to the nodes in I . Graph aware DL. Adding graph regularization 2 � Y − DX � 2 F + α � Q − AX � 2 F + β � H − WX � 2 F + (8) + γ Tr( D T LD ) + λ Tr( XL c X T ) + µ � L � 2 F , where L is the graph Laplacian. 2 Yankelevsky and Elad 2016.
Zonotopic Area Coverage
Zonotopic sets Area packing, mRPI (over)approximation and other related notions may be described via unions of zonotopic sets: �� � min vol ( S ) − vol Z k , (9) Z k k subject to Z k ⊆ S . Zonotopes, given in generator representation 3 Z k = Z ( c k , G k ) = { c k + G k ξ : � ξ � ∞ ≤ 1 } (10) are easy to handle for: Minkowski sum: � � Z ( G 1 , c 1 ) ⊕ Z ( G 2 , c 2 ) = Z ( , c 1 + c 2 ) G 1 G 2 linear mappings: R Z ( G 1 , c 1 ) = Z ( RG 1 , Rc 1 ) 3 Fukuda 2004.
Formulation Each zonotope is parameterized after its center and a scaling vector ( c k , λ k ). These variables help formulate the: inclusion constraint Z ( c k , G · diag ( λ k )) ⊆ U : s ⊤ � | s ⊤ i c k + i G j | λ jk ≤ r i , ∀ i , (11) j where U = { u : s ⊤ i u ≤ r i } . explicitly describe the volume 4 vol ( Z ( c k , G λ k )): � · � � det ( G j 1 ... j n ) � � � vol ( Z ( c k , G Λ k )) = λ jk . 1 ≤ j 1 <... j n ≤ N j ∈{ j 1 ... j n } (12) The formulation becomes simpler if the scaling is homogeneous ( λ ∗ k = λ jk , ∀ j ). 4 Gover and Krikorian 2010.
Implementation We track the OMP formalism, without its theoretical convergence guarantees: Algorithm 5: Area Coverage with zonotopic sets 1 Inputs: area to be covered U , sparsity constraint s 2 Result: pairs of centers and scaling factors ( c k , λ k ) 3 for k = 1 to s do Enlarge the zonotopes until they saturate the constraints 4 k vol ( S k \ Z k ) Select Z k where k = arg min 5 Update the uncovered area vol ( S k +1 ) = vol ( S k ∪ Z k ) 6
Result 0 . 4 0 . 2 0 − 0 . 2 − 0 . 4 − 0 . 4 − 0 . 3 − 0 . 2 − 0 . 1 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5
Thank You! Questions?
Recommend
More recommend