finite sample system identification an overview and a new
play

Finite-Sample System Identification: An Overview and a New - PowerPoint PPT Presentation

Finite-Sample System Identification: An Overview and a New Correlation Method e 1, aji 2 Marco Campi 3 Erik Weyer 4 Algo Car` Bal azs Cs 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and


  1. Finite-Sample System Identification: An Overview and a New Correlation Method e 1, aji 2 Marco Campi 3 Erik Weyer 4 Algo Car` Bal´ azs Cs´ 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Hungary 3 Department of Information Engineering (DII), University of Brescia, Italy 4 Department of Electrical and Electronic Engineering, University of Melbourne, Australia 56th IEEE CDC, Melbourne, Australia, December 12-15, 2017

  2. Regularity Assumption Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 2

  3. Perturbed Residuals Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 3

  4. Perturbed Datasets Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 4

  5. Alternative Regression Models Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 5

  6. Data Generation Let us consider the following data generating system System Structure Y n � F ( U n , W n , I ) where I — initial conditions U n � ( U 1 , . . . , U n ) T — inputs W n � ( W 1 , . . . , W n ) T — noises Y n � ( Y 1 , . . . , Y n ) T — outputs F — true data generating function Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 6

  7. Point Estimation Consider the parametric estimation problem of the system Y n � F θ ∗ ( U n , W n , I ) parametrized with θ ∗ ∈ Θ ⊆ R d (true parameter) Given: finite sample of data, Z � ( U n , Y n , I ) We typically search for a model that best fit the data, that is Point Estimate (Parametric) � θ Z � arg min V ( θ | Z ) θ ∈ Θ where V is a criterion function Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 7

  8. Confidence Regions In practice often some quality tag is needed to judge the estimate. Safety, stability, or quality requirements? ⇒ confidence regions Confidence Region (Level µ ) � � θ ∗ ∈ � Θ Z ,µ ≥ µ P for some µ ∈ (0 , 1), where θ ∗ is the “true” parameter, � Θ Z ,µ ⊆ Θ. Typically the level sets of the (scaled) limiting distribution is used. Issues: only approximately correct for finite samples, requires the existence of a (known) limiting distribution. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 8

  9. Main Assumptions Assumption 1 For any value of θ ∗ ∈ Θ, the relation Y n � F θ ∗ ( U n , W n , I , ) is noise invertible in the sense that, given the values of Y n , U n , I , we can recover the noise W n . Assumption 2 The noise W n is jointly symmetric about zero, i.e., ( W 1 , . . . , W n ) has the same joint probability distribution as ( σ 1 W 1 , . . . , σ n W n ) for all possible sign-sequences, σ i ∈ { +1 , − 1 } , i = 1 , . . . , n . Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 9

  10. Residuals and Sign-Perturbations Given a θ ∈ Θ and dataset Z , the estimated noise is � W n ( θ ). Note that we have � W n ( θ ∗ ) = W n (Assumption 1). Given vector v n = ( v 1 , . . . , v n ) and signs s n = ( σ 1 , . . . , σ n ) ∈ { +1 , − 1 } n , we denote the sign-perturbed vector by s n [ v n ] � ( σ 1 v 1 , . . . , σ n v n ) . = s n [ W n ], for all s n ∈ { +1 , − 1 } n (Assumption 2) d Note that W n where “ d = ” denotes equal in distribution. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 10

  11. Evaluation Functions A core concept is the evaluation function (test statistic), Z : R n × R n × Θ → R , to evaluate the parameter based on ideas discussed before. (Note that Z can also depend on the initial conditions.) Using Z we define a reference and m − 1 sign-perturbed functions, Z 0 ( θ ) � Z ( U n , � W n ( θ ) , θ ) , Z i ( θ ) � Z ( U n , s ( i ) n [ � W n ( θ )] , θ ) , for i = 1 , . . . , m − 1, where s (1) n , . . . , s ( m − 1) are m − 1 n user-generated vectors containing i.i.d. symmetric random signs. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 11

  12. Evaluating Parameters It can be shown that Z 0 ( θ ∗ ) , . . . , Z m − 1 ( θ ∗ ) are conditionally i.i.d. Consider the ordering Z (0) ( θ ∗ ) < · · · < Z ( m − 1) ( θ ∗ ) , where we apply random tie-breaking, if needed. Then All orderings are equally probable! We want to design Z to such that as θ gets “far away” from θ ∗ , Z 0 ( θ ) < Z i ( θ ) with “high probability” for all i = 1 , . . . , m − 1; or Z i ( θ ) < Z 0 ( θ ) with “high probability” for all i = 1 , . . . , m − 1. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 12

  13. Non-Asymptotic Confidence Regions The rank of Z 0 ( θ ) in the ascending ordering of { Z i ( θ ) } m − 1 i =0 is � m − 1 R ( θ ) = 1 + i =1 I ( Z i ( θ ) < Z 0 ( θ )) , where I ( · ) is an indicator function. Exact Confidence The confidence region defined as � � θ ∈ R d : h ≤ R ( θ ) ≤ k � Θ n � is such that P { θ ∗ ∈ � Θ n } = ( k − h + 1) / m , where h , k and m are user-chosen integers (design parameters). Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 13

  14. Construction Ideas Typical construcions of the evaluation function Z are based on • Correlations: we use the fact that, for the true parameter, the residuals (noises) are uncorrelated, also with the inputs E.g.: LSCR (Leave-out Sign-dominant Correlation Regions) • Gradients: based on the gradient (w.r.t. the parameter) of the criterion function of a given point estimate; we perturb the residuals in the gradient and scalarize it with a norm E.g.: SPS (Sign-Perturbed Sums) • Models: new models are estimated based on the alternative (perturbed) datasets and then they are compared to the original (unperturbed) estimate (bootstrap style approach) E.g.: DP (Data Perturbation) Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 14

  15. A New Correlation Approach: Combining LSCR and SPS What are the advantages and disadvantages of LSCR and SPS? LSCR uses correlations (and subsampling). It is a flexible and easy to implement algorithm. It is computationally light, does not require perturbed datasets. However, it is conservative for high dimensinal parameters. SPS uses gradients (and sign-perturbations). It evaluates the errors in all parameters simultaneously (norm). It always constructs confidence regions having exact confidence. However, it needs perturbed datasets, it is computationally heavy. Let us try to combine the advantages of these two approaches! Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 15

  16. A New Correlation Approach: SPCR New method: SPCR (Sign-Perturbed Correlation Regions). For concretness, let us consider an ARX( n a , n b ) model Y t = a 1 Y t − 1 + · · · + a n a Y t − n a + b 1 U t − 1 + · · · + b t − n b U t − n b + W t . Stacked Correlations For a generic U ′ n and W ′ n , we introduce the correlation vectors C t ( U ′ n , W ′ n ) � ( W ′ t W ′ t − 1 , . . . , W ′ t W ′ t − k , W ′ t U ′ t , . . . , W ′ t U ′ t − l +1 ) T , for t = 1 , . . . , n , where k and l are user-chosen parameters. (Typically k + l ≥ n a + n b , and we may need terms from I .) Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 16

  17. A New Correlation Approach: SPCR Evaluation Function for SPCR � n � � n )1 � Q − 1 � 2 , Z ( U ′ n , W ′ n , θ ) � 2 ( U ′ n , W ′ C t ( U ′ n , W ′ n ) n t =1 where Q is a “scaling” matrix defined as � n n ) � 1 Q ( U ′ n , W ′ C t ( U ′ n , W ′ n ) C T t ( U ′ n , W ′ n ) . n t =1 which is assumed to be invertible, for convenience. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 17

  18. A New Correlation Approach: SPCR Confidence Regions for SPCR Θ n � { θ ∈ R n a + n b : R ( θ ) ≤ k } . � And we have exact confidence for parameter vectors, as well P { θ ∗ ∈ � Θ n } = ( k + 1) / m . Note that SPCR is a class of methods where different constructions correspond to different choices of ( k , l ). Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 18

  19. Simulation Example for SPCR Consider a bilinear system generated by Y t � a ∗ Y t − 1 + b ∗ U t + 1 2 U t N t + N t , for t = 1 , . . . , n , with a ∗ = 0 . 7, b ∗ = 1, with zero initial conditions. The input sequence { U t } is generated by U t � 0 . 5 U t − 1 + V t , with zero initial conditions, where { V t } is i.i.d. standard normal. The noise sequence { N t } is i.i.d. Laplacian with zero mean and unit variance, independent of { U t } . Our model class is ARX(1, 1), that is � Y t ( θ ) � a Y t − 1 + b U t . Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 19

  20. Simulation Example for SPCR Figure: 95% confidence regions built by SPCR with k = 2 and l = 2. Car` e, Cs´ aji, Campi, Weyer Finite-Sample System Identification | 20

Recommend


More recommend