undermodelling detection with sign perturbed sums
play

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi - PowerPoint PPT Presentation

Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo Car` Bal azs Cs 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian


  1. Undermodelling Detection with Sign-Perturbed Sums e 1,2 Marco Campi 3 aji 2 Erik Weyer 4 Algo Car` Bal´ azs Cs´ 1 Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands 2 Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Hungary 3 Department of Information Engineering (DII), University of Brescia, Italy 4 Department of Electrical and Electronic Engineering, University of Melbourne, Australia IFAC World Congress, Toulouse, France, July 10, 2017

  2. Table of contents I. Introduction II. Standard SPS for Linear Regression III. SPS with Undermodelling Detection IV. Numerical Experiments V. Summary and Conclusions Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 2

  3. Motivations • SPS (Sign-Perturbed Sums) builds confidence regions around the LS (least squares) estimate of linear regression problems. • Only mild statistical assumptions are needed, e.g., symmetry. • Not needed: stationarity, moments, particular distributions. • SPS has many nice properties (as we will see later), most importantly its confindence regions are exact. • Regarding the models, the assumption of SPS is that the true system generating the observations is in the model class. • However, if the model class is wrong, SPS cannot detect it. • Here, we suggest an extension of SPS, UD-SPS, that still builds exact confidence sets, if the model is correct, but can also detect, in the long run, if the system is undermodelled. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 3

  4. Linear Regression Consider a standard linear regression problem: Linear Regression t θ ∗ + w t y t � ϕ T where y t — output (for time t = 1 , . . . , n ) ϕ t — regressor (exogenous, d dimensional) w t — noise (independet, symmetric) θ ∗ — true parameter (deterministic, d dimensional) Φ n = [ ϕ 1 , . . . , ϕ n ] T — skinny and full rank Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 4

  5. Least Squares Given: a sample, Z , of size n of outputs { y t } and regressors { ϕ t } A classical approach is to minimize the least squares criterion n � V ( θ | Z ) � 1 t θ ) 2 . ( y t − ϕ T 2 t =1 The least squares estimate (LSE) can be found by solving Normal Equation n � θ V (ˆ t ˆ ϕ t ( y t − ϕ T ∇ θ n | Z ) = θ n ) = 0 t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 5

  6. Confidence Ellipsoids LSE is asymptotically normal (under some technical conditions) √ n (ˆ → N (0 , σ 2 R − 1 ) as n → ∞ , d θ n − θ ∗ ) − � n where R is the limit of R n = 1 t =1 ϕ t ϕ T t as n → ∞ (if exists). n Confidence Ellipsoid � � σ 2 θ n ) ≤ µ ˆ θ ∈ R d : ( θ − ˆ θ n ) T R n ( θ − ˆ � n Θ n ,µ � n where P ( θ ∗ ∈ � Θ n ,µ ) ≈ F χ 2 ( d ) ( µ ), where F χ 2 ( d ) is the CDF of χ 2 ( d ), � n 1 t ˆ σ 2 n � ( y t − ϕ T θ n ) 2 , is an estimate of σ 2 . ˆ n − d t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 6

  7. Reference and Sign-Perturbed Sums Let us introduce a reference sum and m − 1 sign-perturbed sums. Reference Sum � n − 1 S 0 ( θ ) � R ϕ t ( y t − ϕ T 2 t θ ) n t =1 Sign-Perturbed Sums � n − 1 ϕ t α i , t ( y t − ϕ T S i ( θ ) � R 2 t θ ) n t =1 for i = 1 , . . . , m − 1, where α i , t ( t = 1 , . . . , n ) are i.i.d. random signs, that is α i , t = ± 1 with probability 1/2 each (Rademacher). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 7

  8. Intuitive Idea: Distributional Invariance Recall: { w t } are independent and each w t is symmetric about zero. Observe that, if θ = θ ∗ , we have ( i = 1 , . . . , m − 1) Distributional Invariance � n − 1 S 0 ( θ ∗ ) = R 2 ϕ t w t n t =1 n � − 1 S i ( θ ∗ ) = R 2 ϕ t α i , t w t n t =1 Consider the ordering � S (0) ( θ ∗ ) � 2 ≺ · · · ≺ � S ( m − 1) ( θ ∗ ) � 2 Note: relation “ ≺ ” is the canonical “ < ” with random tie-breaking All orderings are equally probable! (they are conditionally i.i.d.) Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 8

  9. Intuitive Idea: Reference Dominance What if θ � = θ ∗ ? In fact, the reference paraboloid � S 0 ( θ ) � 2 increases faster than {� S i ( θ ) � 2 } , thus will eventually dominate the ordering. θ � θ ∗ − θ Intuitively, for “large enough” � ˜ θ � , where ˜ Eventual Dominance of the Reference Paraboloid � � � � n n n n � � 2 � � 2 � � � � � t ˜ � � t ˜ � ϕ t ϕ T ± ϕ t ϕ T θ + ϕ t w t > θ + ± ϕ t w t � � � � R − 1 R − 1 t =1 t =1 t =1 t =1 n n with “high probability” (for simplicity ± is used instead of { α i , t } ). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 9

  10. Non-Asymptotic Confidence Regions The rank of � S 0 ( θ ) � 2 in the ordering of {� S i ( θ ) � 2 } w.r.t. ≺ is m − 1 � I ( � S i ( θ ) � 2 ≺ � S 0 ( θ ) � 2 ) , R ( θ ) = 1 + i =1 where I ( · ) is an indicator function. Sign-Perturbed Sums (SPS) Confidence Regions � � θ ∈ R d : R ( θ ) ≤ m − q � Θ n � where m > q > 0 are user-chosen integers (design parameters). Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 10

  11. Exact Confidence (A1) { w t } is a sequence of independent random variables. Each w t has a symmetric probability distribution about zero. (A2) The outer product of regressors is invertible, det( R n ) � = 0. Exact Confidence of SPS � � = 1 − q θ ∗ ∈ � P Θ n m for finite samples. Parameters m and q are under our control. θ n ) � 2 = 0, thus ˆ Note that � S 0 (ˆ θ n ∈ � Θ n , assuming it is non-empty. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 11

  12. Star Convexity Set X ⊆ R d is star convex if there is a star center c ∈ R d with ∀ x ∈ X , ∀ β ∈ [0 , 1] : β x + (1 − β ) c ∈ X . Star Convexity of SPS � Θ n is star convex with the LSE, ˆ θ n , as a star center Hint � Θ n is the union and intersection of ellipsoids containing LSE. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 12

  13. Strong Consistency (A1) independence, symmetricity: { w t } are independent, symmetric � n (A2) invertibility: R n � 1 t =1 ϕ t ϕ T t is invertible n (A3) regressor growth rate: � ∞ t =1 � ϕ t � 4 / t 2 < ∞ � � 2 / t 2 < ∞ (A4) noise moment growth rate: � ∞ E [ w 2 t ] t =1 (A5) Ces` aro summability: lim n →∞ R n = R , which is positive definite Strong Consistency of SPS � � � � � ∞ � ∞ � Θ n ⊆ B ε ( θ ∗ ) P = 1 , k =1 n = k where B ε ( θ ∗ ) � { θ ∈ R d : � θ − θ ∗ � ≤ ε } is a norm ball. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 13

  14. Ellipsoidal Outer Approximation The reference paraboloid can be rewritten as � S 0 ( θ ) � 2 = ( θ − ˆ θ n ) T R n ( θ − ˆ θ n ) . From which an alternative description of the confidence region is � � θ ∈ R d : ( θ − ˆ � θ n ) T R n ( θ − ˆ Θ n ⊆ θ n ) ≤ r ( θ ) , where r ( θ ) is the q th largest value of {� S i ( θ ) � 2 } i � =0 . Ellipsoidal Outer Approximation � θ n ) ≤ r ∗ � θ ∈ R d : ( θ − ˆ � θ n ) T R n ( θ − ˆ Θ n ⊆ Where r ∗ can be efficiently computed by a semi-definite program. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 14

  15. Undermodelling Assume we are given a (finite) sample of input and output data, { u t } , { y t } , which we model with an FIR system y t ( θ ) � ϕ T � t θ + w t , where ϕ t � [ u t − 1 , . . . , u t − d ] ⊤ The true data generation system t θ ∗ + e t + n t , y t = ϕ ⊤ where e t is an extra component that can depend on all past inputs u t − d − 1 , u t − d − 2 , . . . and on all past noises n t − 1 , n t − 2 . . . . If { e t } are nonzero, then the SPS confidence regions will still (almost surely) shrink, but around a wrong parameter value. Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 15

  16. SPS with Undermodelling Detection UD-SPS is obtained from SPS by replacing { S i ( θ ) } with � R n � − 1 � ϕ t � n 2 1 � B n ( y t − ϕ ⊤ � Q 0 ( θ ) t θ ) , B ⊤ D n n ψ t n t =1 � R n � − 1 � ϕ t � n 2 1 � B n ( y t − ϕ ⊤ � Q i ( θ ) α i , t t θ ) , B ⊤ D n ψ t n n t =1 where ψ t is a vector that includes s extra input values preceding n b that are included in ϕ t , ψ t � [ u t − d − 1 , . . . , u t − d − s ] ⊤ , and the ˆ n n � � B n � 1 D n � 1 ϕ t ψ ⊤ ψ t ψ ⊤ t , t . n n t =1 t =1 Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 16

  17. The Connection of UD-SPS and SPS The connection of UD-SPS and SPS can be stated as Reducing UD-SPS to SPS n , for estimating θ ∗ ∈ R d can be The UD-SPS region, � Θ o interpreted as the restriction to a d -dimensional space of a n , that lives in the domain { θ ′ ∈ R d + s } . standard SPS region, � Θ ′ R d + s is the d -dimensional identification space augmented with s extra components: � Θ o n can be identified with the first d n ∩ ( R d × { 0 } s ). components of the set � Θ ′ Car` e, Campi, Cs´ aji, Weyer Undermodelling Detection with SPS | 17

Recommend


More recommend