P HYSICAL MODELS OF NESTED FUNCTIONS T ( ① , ✉ , θ T ) = N 1 × P T ( ρ , M ) = X T · θ T D ( ① , ✉ , θ D ) = q × P D ( α , M ) = X D · θ D L ( ① , ✉ , θ L ) = q × P L ( α , M ) = X L · θ L I sp ( ① , ✉ , θ Isp ) = SAT × P Isp ( h , M ) = X Isp · θ Isp 1 1 1 h ρ α M M M ρ 2 α 2 h 2 X T = N 1 , X D = X L = q , X Isp = SAT . ρ M α M hM M 2 M 2 M 2 . . . . . . . . . 13/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] Output-Error Method Filter-Error Method 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method ① ( t ) = g ( ✉ ( t ) , ① ( t ) , θ ) + ε ( t ) , ˙ t ∈ [ 0, t f ] 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method ① i = g ( ✉ i , ① i , θ ) + ε i , ˙ i = 1, . . . , N 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method N � � ∑ ① i , g ( ✉ i , ① i , θ ) min ℓ ˙ θ i = 1 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method Ex: (Nonlinear) Least-Squares N � � 2 ∑ � � ① i − g ( ✉ i , ① i , θ ) min � ˙ � θ 2 i = 1 14/49
S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method Ex: (Nonlinear) Least-Squares N � � 2 ∑ � � � Y ( ✉ i , ① i , ˙ ① i ) − G ( ✉ i , ① i , ˙ ① i , θ ) min � θ 2 i = 1 14/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ ˙ m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ ˙ mV m = − T ( ✉ , ① , θ T ) ˙ I sp ( ✉ , ① , θ Isp ) 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ ˙ m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ ˙ mV m = − T ( ✉ , ① , θ T ) ˙ I sp ( ✉ , ① , θ Isp ) Nonlinear in states and controls 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ ˙ m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ ˙ mV m = − T ( ✉ , ① , θ T ) ˙ I sp ( ✉ , ① , θ Isp ) Nonlinear in states and controls Nonlinear in parameters 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ m ˙ V + mg sin γ = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) mV ˙ γ + mg cos γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) 0 = T ( ✉ , ① , θ T ) + ˙ mI sp ( ✉ , ① , θ Isp ) Nonlinear in states and controls Nonlinear in parameters 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ m ˙ V + mg sin γ = ( X T · θ T ) cos α − X D · θ D + ε 1 mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 0 = X T · θ T + ˙ m ( X Isp · θ Isp ) + ε 3 mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 � �� � � �� � Y ( ✉ , ① , ˙ ① ) G ( ✉ , ① , ˙ ① , θ ) Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ m ˙ V + mg sin γ = ( X T · θ T ) cos α − X D · θ D + ε 1 mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 0 = X T · θ T + ˙ m ( X Isp · θ Isp ) + ε 3 mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 � �� � � �� � Y ( ✉ , ① , ˙ ① ) G ( ✉ , ① , ˙ ① , θ ) Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling 15/49
L EVERAGING THE DYNAMICS STRUCTURE ˙ h = V sin γ Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling � Multi-task Learning 15/49
M ULTI - TASK R EGRESSION General: Aircraft: Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1 Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2 . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2 . . Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Many other examples: Giant squid neurons [FitzHugh, 1961, Nagumo et al., 1962], Susceptible-infectious-recovered models [Anderson and May, 1992], Mechanical systems ,... 16/49
M ULTI - TASK R EGRESSION General: Aircraft: Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1 Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2 . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2 . . Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: K N ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 ∑ ∑ min θ k = 1 i = 1 16/49
M ULTI - TASK R EGRESSION General: Aircraft: Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1 Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2 . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2 . . Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: Block-sparse Coupling Structure 2 � � X ⊤ X ⊤ 0 0 . . . 0 � � c ,1, i 1, i θ c � � X ⊤ X ⊤ Y 1, i 0 0 0 . . . � � c ,2, i 2, i N θ 1 � � . . ... ∑ � � . − . min . . . 0 0 0 0 � � . θ . � � i = 1 Y K , i � � θ K � � X ⊤ X ⊤ � 0 0 . . . 0 � c , K , i K , i 2 16/49
M ULTI - TASK R EGRESSION General: Aircraft: Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1 Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2 . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2 . . Y 3 = X T · θ T + X Ispm · θ Isp + ε 3 Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: N � Y i − X i θ � 2 ∑ min 2 θ i = 1 with θ = ( θ c , θ 1 , . . . , θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and X i ∈ R K × p . 16/49
F EATURE S ELECTION Our model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . 17/49
F EATURE S ELECTION Our model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . ⇒ High risk of overfitting 17/49
F EATURE S ELECTION Our (sparse) model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . ⇒ High risk of overfitting 17/49
F EATURE S ELECTION Our (sparse) model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . Sparse models are: Less susceptible to overfitting, More compliant with physical models, More interpretable, Lighter/Faster. 17/49
B LOCK - SPARSE L ASSO i = 1 ⊂ R d + 1 i.i.d sample, { ( X i , Y i ) } N Lasso [Tibshirani, 1994]: N ( Y i − X i · θ ) 2 + λ � θ � 1 . ∑ min θ i = 1 F IGURE : 1 Sparsity induced by L 1 norm in Lasso. Source : Wikipedia, Lasso(statistics) 18/49
B LOCK - SPARSE L ASSO Block-sparse structure preserved K N K ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 + λ c � θ c � 1 + ∑ ∑ ∑ min λ k � θ k � 1 θ k = 1 i = 1 k = 1 18/49
B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem K N K ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 + λ c � θ c � 1 + ∑ ∑ ∑ λ k � θ k � 1 min θ k = 1 i = 1 k = 1 18/49
B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem N � Y i − B i β � 2 ∑ 2 + λ c � β � 1 min β i = 1 with β = ( θ c , λ 1 λ c θ 1 , . . . , λ K λ c θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and B i ∈ R K × p . 18/49
B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem N � Y i − X i θ � 2 ∑ 2 + λ c � θ � 1 min θ i = 1 with θ = ( θ c , λ 1 λ c θ 1 , . . . , λ K λ c θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and X i ∈ R K × p , In practice, we choose λ k = λ c , for all k = 1, . . . , 3 and X ⊤ − X ⊤ 0 0 Y 1, i T 1, i D , i X ⊤ X ⊤ , X i = 0 0 Y i = Y 2, i T 2, i L , i X ⊤ X ⊤ 0 0 Y 3, i T , i Ispm , i 18/49
B OOTSTRAP IMPLEMENTATION High correlations between features... 19/49
B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! 19/49
B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. 19/49
B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. Consistency even under high correlations proved in Bach [2008], 19/49
B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. Consistency even under high correlations proved in Bach [2008], Efficient implementations exist: LARS [Efron et al., 2004]. 19/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS N ∑ � Y i − X i θ � 2 2 + λ c � θ � 1 ⇒ ˆ θ ❚ = ˆ min θ ■s♣ = 0 ! θ i = 1 20/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS N ∑ � Y i − X i θ � 2 2 + λ c � θ � 1 ⇒ ˆ θ ❚ = ˆ min θ ■s♣ = 0 ! θ i = 1 Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 0 = X T · θ T + X Ispm · θ Isp + ε 3 20/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS F IGURE : Features correlations higher than 0.9 in absolute value in white. 21/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS ⇒ θ �→ ∑ N i = 1 � Y i − X i θ � 2 2 not injective... Ill-posed problem ! F IGURE : Features correlations higher than 0.9 in absolute value in white. 21/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 0 = X T · θ T + X Ispm · θ Isp + ε 3 λ t ˜ I sp = λ t X Isp · θ Isp + ε 4 N � Y i − X i θ � 2 ∑ 2 + λ c � θ � 1 min θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X T 2 · θ T + X L · θ L + ε 2 0 = X T · θ T + X Ispm · θ Isp + ε 3 λ t ˜ I sp = λ t X Isp · θ Isp + ε 4 N � � � Y i − X i θ � 2 I sp , i − X Isp , i · θ Isp � 2 ∑ 2 + λ t � ˜ + λ c � θ � 1 min 2 θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS Y 1 = X T 1 · θ T − X D · θ D + ε 1 = X T 2 · θ T + X L · θ L + ε 2 Y 2 0 = X T · θ T + X Ispm · θ Isp + ε 3 √ I sp = √ λ t ˜ λ t X Isp · θ Isp + ε 4 N � � � Y i − X i θ � 2 I sp , i − X Isp , i · θ Isp � 2 ∑ 2 + λ t � ˜ + λ c � θ � 1 min 2 θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49
P ROBLEM WITH INTRA - GROUP CORRELATIONS Y 1 = X T 1 · θ T − X D · θ D + ε 1 = X T 2 · θ T + X L · θ L + ε 2 Y 2 0 = X T · θ T + X Ispm · θ Isp + ε 3 √ I sp = √ λ t ˜ λ t X Isp · θ Isp + ε 4 N X i θ � 2 ∑ � ˜ Y i − ˜ 2 + λ c � θ � 1 min θ i = 1 X ⊤ T 1, i − X ⊤ Y 1, i 0 0 D , i X ⊤ X ⊤ Y 2, i 0 0 ˜ ˜ T 2, i L , i Y i = , X i = , X ⊤ X ⊤ 0 0 0 T , i Ispm , i √ √ λ t ˜ λ t X ⊤ I sp , i 0 0 0 Isp , i Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49
F EATURE SELECTION RESULTS 25 different B737-800, 10 471 flights = 8 261 619 observations, 23/49
F EATURE SELECTION RESULTS 25 different B737-800, 10 471 flights = 8 261 619 observations, Block sparse Bolasso used for T , D , L and I sp , We expect similar model structures, 23/49
F EATURE SELECTION RESULTS Feature selection results for the thrust, drag, lift and specific impulse models. 23/49
A CCURACY OF DYNAMICS PREDICTIONS 24/49
R EALISM OF HIDDEN ELEMENTS 25/49
✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION 26/49
✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; 26/49
✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; 26/49
✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. 26/49
F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: � t n � � ✉ ( t ) − ✉ test ( t ) � 2 � ✉ + � ① ( t ) − ① test ( t ) � 2 min dt ① ( ① , ✉ ) t 0 ① ( t ) = g ( ① ( t ) , ✉ ( t ) , ˆ θ ) , s.t. ˙ where � · � ✉ , � · � ① denote scaling norms. 26/49
Recommend
More recommend