Dictionary Learning Applications in Control Theory Paul Irofti, - PowerPoint PPT Presentation

Dictionary Learning Applications in Control Theory Paul Irofti, Florin Stoican Politehnica University of Bucharest Faculty of Automatic Control and Computers Department of Automatic Control and Systems Engineering Email: paul@irofti.net, florin.stoican@acse.pub.ro Recent Advances in Artificial Intelligence, June 20 th , 2017 Acknowledgment: This work was supported by the Romanian National Authority for Scientific Research, CNCS - UEFISCDI, project number PN-II-RU-TE-2014-4-2713.

Sparse Representation (SR) = · y x D

Orthogonal Matching Pursuit (OMP) Algorithm 1: OMP a 1 Arguments: D , y , s 2 Initialize: r = y , I = ∅ 3 for k = 1 : s do Compute correlations with residual: z = D T r 4 Select new column: i = arg max j | z j | 5 Increase support: I ← I ∪ { i } 6 Compute new solution: x = LS( D , y , I ) 7 Update residual: r = y − D I x I 8 a Pati, Rezaiifar, and Krishnaprasad 1993.

Dictionary Learning (DL) ≈ · Y D X

The Dictionary Learning (DL) Problem Given a data set Y ∈ R p × m and a sparsity level s , minimize the bivariate function � Y − DX � 2 minimize F D , X (1) subject to � d j � 2 = 1 , 1 ≤ j ≤ n � x i � 0 ≤ s , 1 ≤ i ≤ m , where D ∈ R p × n is the dictionary (whose columns are called atoms) and X ∈ R n × m the sparse representations matrix.

Approach Algorithm 2: Dictionary learning – general structure 1 Arguments: signal matrix Y , target sparsity s 2 Initialize: dictionary D (with normalized atoms) 3 for k = 1 , 2 , . . . do With fixed D , compute sparse representations X 4 With fixed X , update atoms d j , j = 1 : n 5

DL Algorithms K-SVD 1 solves the optimization problem in sequence 2 � �   � � �  − d j X j , I j � � min  Y I j − d ℓ X ℓ, I ℓ (2) � � d j , X j , I j � � ℓ � = j � � F where all atoms excepting d j are fixed. This is seen as a rank-1 approximation and the solution is given by the singular vectors corresponding to the largest singular value. d j = u 1 , X j , I j = σ 1 v 1 . (3) 1 Aharon, Elad, and Bruckstein 2006.

LC-KSVD � Y − DX � 2 F + α � Q − AX � 2 F + β � H − WX � 2 minimize F D , X , A , W (4) subject to � d j � 2 = 1 , 1 ≤ j ≤ n � x i � 0 ≤ s , 1 ≤ i ≤ m , dictionary atoms evenly split among classes q i has non-zero entries where y i and d i share the same label. linear transformation A encourages discrimination in X h i = e j where j is the class label of y i W represents the learned classifier parameters

Fault Detection and Isolation in Water Networks

FDI via DL Water networks pose some interesting issues: large scale, distributed network with few sensors user demand unknown or imprecise pressure dynamics nonlinear (analytic solutions impractical) The DL approach for FDI: a residual signal compares expected and measured pressures r i ( t ) = p i ( f i ( t ) , f j ( t ) , t ) − ¯ p i , ∀ i , j (5) to each fault is assigned a class and DL provides the atoms which discriminate between them each residual is sparsely described by atoms and thus, FDI is achieved iff the classification is unambiguous

Hanoi 100 1 junction node 1350 20 21 19 2 3 tank node 2 2 0 0 1500 500 9 0 0 400 0 5 18 node with sensor 1 4 1 2650 800 junction partition 1450 5 22 17 27 pipe connection 0 6 5 1500 4 0 5 1230 fault event 7 1 0 850 0 0 7 2 16 23 legend 28 0 5 8 8 0 0 0 3 3 7 1 12 11 10 800 2 9 24 13 14 25 1600 15 26 31 3500 1200 9 5 0 800 850 5 0 0 950 550 300 7 5 0 30 860 150 29

Sensor Placement Let R ∈ R n × mn be measured pressure residuals in all n network nodes. For each node we simulate m different faults. Given s < n available sensors, apply OMP on each column r � r − I n x � 2 minimize 2 x (6) subject to � x � 0 ≤ s , resulting in matrix X with s -sparse columns approximating R .

Placement Strategies (a) select the s most common used atoms (b) from each m -block select most frequent s atoms; of the n · s atoms, select again the first s . 10 case (a) 9 number of sensors case (b) 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 selected nodes

Learning Algorithm 3: Placement and FDI learning a 1 Inputs: training residuals R ∈ R n × nm parameters s , α , β 2 3 Result: dictionary D , classifier W , sensor nodes I s 4 Select s sensor nodes I s based on matrix R using (a) or (b) 5 Let R I s be the restriction of R to the rows in I s 6 Use R I s , α and β to learn D and W from (4) a Irofti and Stoican 2017.

Fault Detection Algorithm 4: Fault detection and isolation 1 Inputs: testing residuals R ∈ R s × mn dictionary D , classifier W 2 3 Result: prediction P ∈ N mn 4 for k = 1 to mn do Use OMP to obtain x k using r k and D 5 Label: L k = Wx k 6 Classify: p k = arg max c L k 7 Position c of the largest entry from L k is the predicted class.

Today Improved sensor placement. Iteratively choose s rows from R solving at each step 1 � proj R I r k � 2 i = arg min 2 + λ , r k ∈ R I c , (7) � δ k , I � 1 k where I is the set of currently selected rows and δ k , I vector elements are the distances from node k to the nodes in I . Graph aware DL. Adding graph regularization 2 � Y − DX � 2 F + α � Q − AX � 2 F + β � H − WX � 2 F + (8) + γ Tr( D T LD ) + λ Tr( XL c X T ) + µ � L � 2 F , where L is the graph Laplacian. 2 Yankelevsky and Elad 2016.

Zonotopic Area Coverage

Zonotopic sets Area packing, mRPI (over)approximation and other related notions may be described via unions of zonotopic sets: �� min vol ( S ) − vol Z k , (9) Z k k subject to Z k ⊆ S . Zonotopes, given in generator representation 3 Z k = Z ( c k , G k ) = { c k + G k ξ : � ξ � ∞ ≤ 1 } (10) are easy to handle for: Minkowski sum: � � Z ( G 1 , c 1 ) ⊕ Z ( G 2 , c 2 ) = Z ( , c 1 + c 2 ) G 1 G 2 linear mappings: R Z ( G 1 , c 1 ) = Z ( RG 1 , Rc 1 ) 3 Fukuda 2004.

Formulation Each zonotope is parameterized after its center and a scaling vector ( c k , λ k ). These variables help formulate the: inclusion constraint Z ( c k , G · diag ( λ k )) ⊆ U : s ⊤ � | s ⊤ i c k + i G j | λ jk ≤ r i , ∀ i , (11) j where U = { u : s ⊤ i u ≤ r i } . explicitly describe the volume 4 vol ( Z ( c k , G λ k )): � · � � det ( G j 1 ... j n ) � � � vol ( Z ( c k , G Λ k )) = λ jk . 1 ≤ j 1 <... j n ≤ N j ∈{ j 1 ... j n } (12) The formulation becomes simpler if the scaling is homogeneous ( λ ∗ k = λ jk , ∀ j ). 4 Gover and Krikorian 2010.

Implementation We track the OMP formalism, without its theoretical convergence guarantees: Algorithm 5: Area Coverage with zonotopic sets 1 Inputs: area to be covered U , sparsity constraint s 2 Result: pairs of centers and scaling factors ( c k , λ k ) 3 for k = 1 to s do Enlarge the zonotopes until they saturate the constraints 4 k vol ( S k \ Z k ) Select Z k where k = arg min 5 Update the uncovered area vol ( S k +1 ) = vol ( S k ∪ Z k ) 6

Result 0 . 4 0 . 2 0 − 0 . 2 − 0 . 4 − 0 . 4 − 0 . 3 − 0 . 2 − 0 . 1 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5

Thank You! Questions?

Dictionary Learning Applications in Control Theory Paul Irofti, - PowerPoint PPT Presentation

Dictionary Learning Applications in Control Theory Paul Irofti, Florin Stoican Politehnica University of Bucharest Faculty of Automatic Control and Computers Department of Automatic Control and Systems Engineering Email: paul@irofti.net,

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Dictionaries A Good morning dictionary English: Good morning Spanish: Buenas das

Hashing - Introduction Dictionary Dictionary = a dynamic set that supports the = a dynamic set

Dictionary lookup Suppose youre looking up a word in the dictionary (paper one, not

The dictionary problem. A dictionary can be seen as a database of records; in each record we

6. Dictionary models for text compression Previous techniques: Predictive, statistical One

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k

Agenda Announcements Dictionary please snarf code for class today

Dictionaries and Sets Ali Taheri Sharif University of Technology Spring 2019 Outline 1.

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 20: Dictionary

Test of Time Award Online Dictionary Learning for Sparse Coding Julien Mairal, Francis Bach, Jean

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Math 4997-1 Lecture 8: Introduction to bond-based peridynamics

Experimenting Cognitive Radio Communication on FIT/CorteXlab Tanguy Risset and the FIT Team:

Article Feedback A new way to improve Wikipedia Fabrice Florin Product Manager, Editor

Tracking and Alignment in LHCb Florin MACIUC on behalf of LHCb collaboration

Article Feedback V5 Project Update Prepared by: Fabrice Florin Wikimedia Foundation

KUL guest presentation Florin Stoican Norwegian University of Science and Technology (NTNU) -

Outline 1 Introduction Motivation Some Definitions Previous Results 2 Methods for finding LI

Wikimoldia Digital Revitalization of the Moldovan Language Dr. Christian