e xploration de donn
play

E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A - PowerPoint PPT Presentation

E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A ERIENNES C edric Rommel Directeurs de th` ese: Fr ed eric Bonnans, Pierre Martinon Encadrant Safety Line: Baptiste Gregorutti Soutenance de th` ese, 26 octobre


  1. P HYSICAL MODELS OF NESTED FUNCTIONS  T ( ① , ✉ , θ T ) = N 1 × P T ( ρ , M ) = X T · θ T      D ( ① , ✉ , θ D ) = q × P D ( α , M ) = X D · θ D L ( ① , ✉ , θ L ) = q × P L ( α , M ) = X L · θ L      I sp ( ① , ✉ , θ Isp ) = SAT × P Isp ( h , M ) = X Isp · θ Isp       1 1 1 h ρ α             M M M             ρ 2 α 2 h 2       X T = N 1 , X D = X L = q , X Isp = SAT .       ρ M α M hM             M 2 M 2 M 2             . . . . . . . . . 13/49

  2. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] Output-Error Method Filter-Error Method 14/49

  3. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method 14/49

  4. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method ① ( t ) = g ( ✉ ( t ) , ① ( t ) , θ ) + ε ( t ) , ˙ t ∈ [ 0, t f ] 14/49

  5. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method ① i = g ( ✉ i , ① i , θ ) + ε i , ˙ i = 1, . . . , N 14/49

  6. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method N � � ∑ ① i , g ( ✉ i , ① i , θ ) min ℓ ˙ θ i = 1 14/49

  7. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method Ex: (Nonlinear) Least-Squares N � � 2 ∑ � � ① i − g ( ✉ i , ① i , θ ) min � ˙ � θ 2 i = 1 14/49

  8. S TATE - OF - THE - ART - [J ATEGAONKAR , 2006] � Output-Error Method Less scalable to many trajectories Filter-Error Method Equation-Error Method Ex: (Nonlinear) Least-Squares N � � 2 ∑ � � � Y ( ✉ i , ① i , ˙ ① i ) − G ( ✉ i , ① i , ˙ ① i , θ ) min � θ 2 i = 1 14/49

  9. L EVERAGING THE DYNAMICS STRUCTURE   ˙  h = V sin γ        V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ   ˙   m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ   ˙    mV     m = − T ( ✉ , ① , θ T )    ˙  I sp ( ✉ , ① , θ Isp ) 15/49

  10. L EVERAGING THE DYNAMICS STRUCTURE   ˙  h = V sin γ        V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ   ˙   m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ   ˙    mV     m = − T ( ✉ , ① , θ T )    ˙  I sp ( ✉ , ① , θ Isp ) Nonlinear in states and controls 15/49

  11. L EVERAGING THE DYNAMICS STRUCTURE   ˙  h = V sin γ        V = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D ) − mg sin γ   ˙   m γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L ) − mg cos γ   ˙    mV     m = − T ( ✉ , ① , θ T )    ˙  I sp ( ✉ , ① , θ Isp ) Nonlinear in states and controls Nonlinear in parameters 15/49

  12. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ          m ˙  V + mg sin γ = T ( ✉ , ① , θ T ) cos α − D ( ✉ , ① , θ D )     mV ˙ γ + mg cos γ = T ( ✉ , ① , θ T ) sin α + L ( ✉ , ① , θ L )          0 = T ( ✉ , ① , θ T ) + ˙ mI sp ( ✉ , ① , θ Isp )  Nonlinear in states and controls Nonlinear in parameters 15/49

  13. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ          m ˙  V + mg sin γ = ( X T · θ T ) cos α − X D · θ D + ε 1     mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2          0 = X T · θ T + ˙ m ( X Isp · θ Isp ) + ε 3  mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 � �� � � �� � Y ( ✉ , ① , ˙ ① ) G ( ✉ , ① , ˙ ① , θ ) Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49

  14. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ          m ˙  V + mg sin γ = ( X T · θ T ) cos α − X D · θ D + ε 1     mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2          0 = X T · θ T + ˙ m ( X Isp · θ Isp ) + ε 3  mV ˙ γ + mg cos γ = ( X T · θ T ) sin α + X L · θ L + ε 2 � �� � � �� � Y ( ✉ , ① , ˙ ① ) G ( ✉ , ① , ˙ ① , θ ) Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49

  15. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ           Y 1 = X T 1 · θ T − X D · θ D + ε 1     Y 2 = X T 2 · θ T + X L · θ L + ε 2          Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Nonlinear in states and controls Nonlinear in parameters → Linear in parameters 15/49

  16. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ           Y 1 = X T 1 · θ T − X D · θ D + ε 1     Y 2 = X T 2 · θ T + X L · θ L + ε 2          Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling 15/49

  17. L EVERAGING THE DYNAMICS STRUCTURE    ˙ h = V sin γ           Y 1 = X T 1 · θ T − X D · θ D + ε 1     Y 2 = X T 2 · θ T + X L · θ L + ε 2          Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling � Multi-task Learning 15/49

  18. M ULTI - TASK R EGRESSION General: Aircraft:  Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1       Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2     . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2  . .        Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Many other examples: Giant squid neurons [FitzHugh, 1961, Nagumo et al., 1962], Susceptible-infectious-recovered models [Anderson and May, 1992], Mechanical systems ,... 16/49

  19. M ULTI - TASK R EGRESSION General: Aircraft:  Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1       Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2     . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2  . .        Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: K N ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 ∑ ∑ min θ k = 1 i = 1 16/49

  20. M ULTI - TASK R EGRESSION General: Aircraft:  Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1       Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2     . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2  . .        Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: Block-sparse Coupling Structure 2 � �  X ⊤ X ⊤  0 0 . . . 0 � �   c ,1, i 1, i θ c � �   X ⊤ X ⊤ Y 1, i 0 0 0 . . . �   � c ,2, i 2, i N θ 1 �     � . . ... ∑ �       � .  − . min . . . 0 0 0 0 �     �  . θ . �   �   i = 1 Y K , i � �   θ K � � X ⊤ X ⊤ � 0 0 . . . 0 � c , K , i K , i 2 16/49

  21. M ULTI - TASK R EGRESSION General: Aircraft:  Y 1 = X c ,1 · θ c + X 1 · θ 1 + ε 1       Y 1 = X T 1 · θ T − X D · θ D + ε 1 Y 2 = X c ,2 · θ c + X 2 · θ 2 + ε 2     . . . . Y 2 = X T 2 · θ T + X L · θ L + ε 2  . .        Y 3 = X T · θ T + X Ispm · θ Isp + ε 3  Y K = X c , K · θ c + X K · θ K + ε K Coupling parameters , Task specific parameters Multi-task Linear Least-Squares: N � Y i − X i θ � 2 ∑ min 2 θ i = 1 with θ = ( θ c , θ 1 , . . . , θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and X i ∈ R K × p . 16/49

  22. F EATURE S ELECTION Our model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . 17/49

  23. F EATURE S ELECTION Our model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . ⇒ High risk of overfitting 17/49

  24. F EATURE S ELECTION Our (sparse) model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . ⇒ High risk of overfitting 17/49

  25. F EATURE S ELECTION Our (sparse) model: T = N 1 ( θ T ,1 + θ T ,2 ρ + θ T ,3 M + θ T ,4 ρ 2 + θ T ,5 ρ M + θ T ,6 M 2 + θ T ,7 ρ 3 + θ T ,8 ρ 2 M + θ T ,9 ρ M 2 + θ T ,10 M 3 + θ T ,11 ρ 4 + θ T ,12 ρ 3 M + θ T ,13 ρ 2 M 2 + θ T ,14 ρ M 3 + θ T ,15 M 4 ) . Mattingly’s model [Mattingly et al., 1992]: T = N 1 ( θ T ,1 ρ + θ T ,2 ρ M 3 ) . Sparse models are: Less susceptible to overfitting, More compliant with physical models, More interpretable, Lighter/Faster. 17/49

  26. B LOCK - SPARSE L ASSO i = 1 ⊂ R d + 1 i.i.d sample, { ( X i , Y i ) } N Lasso [Tibshirani, 1994]: N ( Y i − X i · θ ) 2 + λ � θ � 1 . ∑ min θ i = 1 F IGURE : 1 Sparsity induced by L 1 norm in Lasso. Source : Wikipedia, Lasso(statistics) 18/49

  27. B LOCK - SPARSE L ASSO Block-sparse structure preserved K N K ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 + λ c � θ c � 1 + ∑ ∑ ∑ min λ k � θ k � 1 θ k = 1 i = 1 k = 1 18/49

  28. B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem K N K ( Y k , i − X c , k , i · θ c − X k , i · θ k ) 2 + λ c � θ c � 1 + ∑ ∑ ∑ λ k � θ k � 1 min θ k = 1 i = 1 k = 1 18/49

  29. B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem N � Y i − B i β � 2 ∑ 2 + λ c � β � 1 min β i = 1 with β = ( θ c , λ 1 λ c θ 1 , . . . , λ K λ c θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and B i ∈ R K × p . 18/49

  30. B LOCK - SPARSE L ASSO Block-sparse structure preserved � Equivalent to Lasso problem N � Y i − X i θ � 2 ∑ 2 + λ c � θ � 1 min θ i = 1 with θ = ( θ c , λ 1 λ c θ 1 , . . . , λ K λ c θ K ) ∈ R p , p = d c + ∑ K k = 1 d k , Y i ∈ R K and X i ∈ R K × p , In practice, we choose λ k = λ c , for all k = 1, . . . , 3 and     X ⊤ − X ⊤ 0 0 Y 1, i T 1, i D , i X ⊤ X ⊤  , X i = 0 0 Y i = Y 2, i    T 2, i L , i X ⊤ X ⊤ 0 0 Y 3, i T , i Ispm , i 18/49

  31. B OOTSTRAP IMPLEMENTATION High correlations between features... 19/49

  32. B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! 19/49

  33. B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. 19/49

  34. B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. Consistency even under high correlations proved in Bach [2008], 19/49

  35. B OOTSTRAP IMPLEMENTATION High correlations between features... ⇒ Inconsistent selections via the lasso ! Bolasso - Bach [2008] i = 1 ⊂ R K × ( K + 1 ) × R K , training data T = { ( X i , Y i ) } N Require: number of bootstrap replicates b , L 1 penalty parameter λ c , 1: for k = 1 to b do Generate bootstrap sample T k , 2: θ k from T k , Compute Block sparse Lasso estimate ˆ 3: Compute support J k = { j , ˆ θ k j � = 0 } , 4: 5: end for 6: Compute intersection J = � b k = 1 J k , 7: Compute ˆ θ J from selected features using Least-Squares. Consistency even under high correlations proved in Bach [2008], Efficient implementations exist: LARS [Efron et al., 2004]. 19/49

  36. P ROBLEM WITH INTRA - GROUP CORRELATIONS N ∑ � Y i − X i θ � 2 2 + λ c � θ � 1 ⇒ ˆ θ ❚ = ˆ min θ ■s♣ = 0 ! θ i = 1 20/49

  37. P ROBLEM WITH INTRA - GROUP CORRELATIONS N ∑ � Y i − X i θ � 2 2 + λ c � θ � 1 ⇒ ˆ θ ❚ = ˆ min θ ■s♣ = 0 ! θ i = 1  Y 1 = X T 1 · θ T − X D · θ D + ε 1   Y 2 = X T 2 · θ T + X L · θ L + ε 2   0 = X T · θ T + X Ispm · θ Isp + ε 3 20/49

  38. P ROBLEM WITH INTRA - GROUP CORRELATIONS F IGURE : Features correlations higher than 0.9 in absolute value in white. 21/49

  39. P ROBLEM WITH INTRA - GROUP CORRELATIONS ⇒ θ �→ ∑ N i = 1 � Y i − X i θ � 2 2 not injective... Ill-posed problem ! F IGURE : Features correlations higher than 0.9 in absolute value in white. 21/49

  40. P ROBLEM WITH INTRA - GROUP CORRELATIONS  Y 1 = X T 1 · θ T − X D · θ D + ε 1     Y 2 = X T 2 · θ T + X L · θ L + ε 2 0 = X T · θ T + X Ispm · θ Isp + ε 3     λ t ˜ I sp = λ t X Isp · θ Isp + ε 4 N � Y i − X i θ � 2 ∑ 2 + λ c � θ � 1 min θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49

  41. P ROBLEM WITH INTRA - GROUP CORRELATIONS  Y 1 = X T 1 · θ T − X D · θ D + ε 1     Y 2 = X T 2 · θ T + X L · θ L + ε 2 0 = X T · θ T + X Ispm · θ Isp + ε 3     λ t ˜ I sp = λ t X Isp · θ Isp + ε 4 N � � � Y i − X i θ � 2 I sp , i − X Isp , i · θ Isp � 2 ∑ 2 + λ t � ˜ + λ c � θ � 1 min 2 θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49

  42. P ROBLEM WITH INTRA - GROUP CORRELATIONS  Y 1 = X T 1 · θ T − X D · θ D + ε 1     = X T 2 · θ T + X L · θ L + ε 2 Y 2 0 = X T · θ T + X Ispm · θ Isp + ε 3   √ I sp = √   λ t ˜ λ t X Isp · θ Isp + ε 4 N � � � Y i − X i θ � 2 I sp , i − X Isp , i · θ Isp � 2 ∑ 2 + λ t � ˜ + λ c � θ � 1 min 2 θ i = 1 Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49

  43. P ROBLEM WITH INTRA - GROUP CORRELATIONS  Y 1 = X T 1 · θ T − X D · θ D + ε 1     = X T 2 · θ T + X L · θ L + ε 2 Y 2 0 = X T · θ T + X Ispm · θ Isp + ε 3   √ I sp = √   λ t ˜ λ t X Isp · θ Isp + ε 4 N X i θ � 2 ∑ � ˜ Y i − ˜ 2 + λ c � θ � 1 min θ i = 1     X ⊤ T 1, i − X ⊤ Y 1, i 0 0 D , i X ⊤ X ⊤  Y 2, i   0 0  ˜ ˜ T 2, i L , i     Y i =  , X i =  ,    X ⊤ X ⊤  0 0 0   T , i Ispm , i √ √ λ t ˜ λ t X ⊤ I sp , i 0 0 0 Isp , i Prior model ˜ I sp from Roux [2005] � ˜ I sp , i = ˜ I sp ( ✉ i , ① i ) , i = 1, . . . , N . 22/49

  44. F EATURE SELECTION RESULTS 25 different B737-800, 10 471 flights = 8 261 619 observations, 23/49

  45. F EATURE SELECTION RESULTS 25 different B737-800, 10 471 flights = 8 261 619 observations, Block sparse Bolasso used for T , D , L and I sp , We expect similar model structures, 23/49

  46. F EATURE SELECTION RESULTS Feature selection results for the thrust, drag, lift and specific impulse models. 23/49

  47. A CCURACY OF DYNAMICS PREDICTIONS 24/49

  48. R EALISM OF HIDDEN ELEMENTS 25/49

  49. ✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION 26/49

  50. ✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; 26/49

  51. ✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; 26/49

  52. ✉ ✉ ① ① ✉ ① ① ✉ ① ① ✉ ✉ ① F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. 26/49

  53. F LIGHT RESIMULATION Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: � t n � � ✉ ( t ) − ✉ test ( t ) � 2 � ✉ + � ① ( t ) − ① test ( t ) � 2 min dt ① ( ① , ✉ ) t 0 ① ( t ) = g ( ① ( t ) , ✉ ( t ) , ˆ θ ) , s.t. ˙ where � · � ✉ , � · � ① denote scaling norms. 26/49

Recommend


More recommend