Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni Sui Tang Ming Zhong July 11, 2019 Applied Math and Comp Sci Colloquium University of Pennsylvania FL acknowledges supports from JHU, NSF
Outline Motivation and problem statement 1 Learning via nonparametric regression 2 Numerical examples 3 Ongoing work and open problems 4 2 / 36
Motivation Q: What is the law of interaction between particles/agents? 3 / 36
Motivation Q: What is the law of interaction between particles/agents? N � x i ( t ) + 1 m ¨ x i ( t ) = − ν ˙ K ( x i , x j ) , N j = 1 , j � = i Newton’s law of gravitation: K ( x , y ) = Gm 1 m 2 , r = | x − y | r 2 Molecular fluid: K ( x , y ) = ∇ x [Φ( | x − y | )] Lennard-Jones potential: Φ( r ) = c 1 r 12 − c 2 r 6 . flocking birds/school of fish K ( x , y ) = φ ( | x − y | ) x − y | x − y | opinion/voter models, bacteria/cells ... a a (1) Cucker+Smale: On the mathematics of emergence. 2007. (2) Vic- sek+Zafeiris: Collective motion. 2012. (3) Mostch+Tadmor: Heterophilious Dy- namics Enhances Consensus. 2014 ... 4 / 36
An inference problem: Infer the rule of interaction in the system N � x i ( t )+ 1 i = 1 , · · · , N , x i ( t ) ∈ R d m ¨ x i ( t ) = − ν ˙ K ( x i − x j ) , N j = 1 , j � = i from observations of trajectories. x i is the position of the i-th particle/agent Data : many independent trajectories { x j ( t ) : t ∈ T } M j = 1 Goal : infer φ : R + → R in K ( x ) = −∇ Φ( | x | ) = − φ ( | x | ) x | x | For simplicity, we consider only first-order systems ( m = 0) ↓ 5 / 36
N � φ true ( | x i − x j | ) x j − x i x i ( t ) = 1 ˙ ˙ → x = f φ true ( x ( t )) N | x j − x i | j = 1 , j � = i Least squares regression: with H n = span { e i } n i = 1 , M � x m − f φ ( x m ) �| 2 ˆ E M ( φ ) := |� ˙ φ n = arg min φ ∈H n m = 1 Choice of H n & function space of learning? Inverse problem well-posed/ identifiability? Consistency and rate of “convergence”? → hypothesis testing and model selection 6 / 36
Outline Motivation and problem statement 1 Learning via nonparametric regression: 2 ◮ Function space of regression ◮ Identifiability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 3 Ongoing work and open problems 4 7 / 36
Learning via nonparametric regression The dynamical system: ˙ x = f φ true ( x ( t )) Data: M -trajectories { x m ( t ) : t ∈ T } M m = 1 x m ( 0 ) i . i . d ∼ µ 0 ∈ P ( R dN ) T = [ 0 , T ] or { t 1 , · · · , t L } with ˙ x ( t i ) Goal: nonparametric inference 1 of φ true 1 (1) Bongini, Fornasier, Hansen, Maggioni: Inferring Interaction Rules for mean field equations, M3AS, 2017. (2) Binev, Cohen, Dahmen, Devore and Temlyakov: Universal Algorithms for learning theory, JMLR 2005. (3) Cucker, Smale: On the mathematical foundation of learning. Bulletin of AMS, 2001. 8 / 36
L , M � 1 ˙ ˆ � f φ ( X m ( t l )) − X m ( t l ) � 2 φ M , H = arg min E M ( φ ) := ML φ ∈H l , m = 1 E M ( φ ) is quadratic in φ , and E M ( φ ) ≥ E M ( φ true ) = 0 The minimizer exists for any H = H n = span { e 1 , . . . , e n } Tasks M →∞ E M ( · ) E ∞ ( · ) Choice of H n & function space of learning? � ? M →∞ � φ M , H φ ∞ , H Inverse problem well-posed/ identifiability? ?? ? dist ( H ,φ true ) → 0 Consistency and rate of φ true “convergence”? 9 / 36
Review of classical nonparametric regression: Estimate y = φ ( z ) : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 → E [ Y | Z = z ] φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 1 2 s + 1 , s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D 2 2 (1) F.Cucker and S.Smale. On the mathematical foundations of learning. Bulletin of the AMS, 2002 (2) L.Györfi, M.Kohler, A.Krzyzak, H.Walk, A Distribution-Free Theoryof Nonparametric Regression (Springer 2002). 10 / 36
Review of classical nonparametric regression: Estimate y = φ ( z ) : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 1 2 s + 1 , s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D Our case: learning of kernel φ : R + → R from data { x m ( t ) } N � φ ( | x i − x j | ) x j − x i x i ( t ) = 1 ˙ N | x j − x i | j = 1 , j � = i { r m ij ( t ) := | x m i ( t ) − x m j ( t ) |} not iid The values of φ ( r m ij ( t )) unknown 11 / 36
Regression measure Distribution of pairwise-distances ρ : R + → R L , N � 1 ρ T ( r ) = E µ 0 δ r ii ′ ( t l ) ( r ) � N � L 2 l , i , i ′ = 1 , i < i ′ M →∞ unknown, estimated by empirical distribution ρ M − − − − → ρ T (LLN) T intrinsic to the dynamics Regression function space L 2 ( ρ T ) the admissible set ⊂ L 2 ( ρ T ) H = piecewise polynomials ⊂ L 2 ( ρ T ) singular kernels ⊂ L 2 ( ρ T ) 12 / 36
M →∞ E M ( · ) E ∞ ( · ) Identifiability: a coercivity condition ? M →∞ � � ˆ φ M , H φ ∞ , H φ M , H = arg min E M ( φ ) ? φ ∈H ? φ true � T 1 E ∞ (ˆ φ − φ true ( X ( t )) � 2 dt ≥ c � ˆ φ − φ true � 2 φ ) −E ∞ ( φ true ) = E µ 0 � f ˆ L 2 ( ρ T ) NT 0 Coercivity condition. ∃ c T , H > 0 s.t. for all ϕ ∈ H ⊂ L 2 ( ρ T ) � T 1 E µ 0 � f ϕ ( x ( t )) � 2 dt = �� ϕ, ϕ �� ≥ c T , H � ϕ � 2 L 2 ( ρ T ) NT 0 � T 1 coercivity: bilinear functional �� ϕ, ψ �� := 0 E µ 0 � f ϕ , f ψ � ( x ( t )) dt NT controls condition number of regression matrix 13 / 36
Consistency of estimator Theorem (L., Maggioni, Tang, Zhong) Assume the coercivity condition. Let {H n } be a sequence of compact convex subsets of L ∞ ([ 0 , R ]) such that inf ϕ ∈H n � ϕ − φ true � ∞ → 0 as n → ∞ . Then M →∞ � � n →∞ lim lim φ M , H n − φ true � L 2 ( ρ T ) = 0 , almost surely . For each n , compactness of { � φ M , H n } and coercivity implies that φ M , H n → � � φ ∞ , H n in L 2 Increasing H n and coercivity implies consistency. In general, truncation to make H n compact 14 / 36
Optimal rate of convergence Theorem (L. Maggioni, Tang, Zhong) Let {H n } be a seq. of compact convex subspaces of L ∞ [ 0 , R ] s.t. ϕ ∈H n � ϕ − φ true � ∞ ≤ c 1 n − s . dim ( H n ) ≤ c 0 n , and inf 1 2 s + 1 : then Assume the coercivity condition. Choose n ∗ = ( M / log M ) � log M � s 2 s + 1 E µ 0 [ � � φ T , M , H n ∗ − φ true � L 2 ( ρ T ) ] ≤ C . M The 2nd condition is about regularity: φ ∈ C s Choice of dim ( H n ) : adaptive to s and M 15 / 36
Prediction of future evolution Theorem (L., Maggioni, Tang, Zhong) Denote by � X ( t ) and X ( t ) the solutions of the systems with kernels � φ and φ respectively, starting from the same initial conditions that are drawn i.i.d from µ 0 . Then we have √ � � N � � X ( t ) − X ( t ) � 2 ] � φ − φ true � 2 E µ 0 [ sup L 2 ( ρ T ) , t ∈ [ 0 , T ] Follows from Grownwall’s inequality 16 / 36
Outline Motivation and problem statement 1 Learning via nonparametric regression: 2 ◮ A regression measure and function space ◮ Learnability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 3 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Ongoing work and open problems 4 17 / 36
Numerical examples The regression algorithm � � 2 L , M , N � � N � � 1 1 � � x ( m ) N ϕ ( r m i , i ′ ( t l )) r m � ˙ E M ( ϕ ) = ( t l ) − i , i ′ ( t l ) , � � i LMN � l , m , i = 1 i ′ = 1 n � a p ψ p ( r ) : a = ( a 1 , . . . , a n ) ∈ R n } , H n := { ϕ = p = 1 M � E L , M ( ϕ ) = E L , M ( a ) = 1 � d m − Ψ m L a � 2 R LNd . M m = 1 M M � � 1 L a = 1 A m b m L , rewrite as A M a = b M M M m = 1 m = 1 can be computed parallelly Caution: choice of { ψ p } affects condi( A M ) 18 / 36
Assume the coercivity condition: �� ϕ, ϕ �� ≥ c T , H � ϕ � 2 L 2 ( ρ T ) . Proposition (Lower bound on smallest singular value of A M ) Let { ψ 1 , · · · , ψ n } be a basis of H n s.t. � ψ p , ψ p ′ � L 2 ( ρ L T ) = δ p , p ′ , � ψ p � ∞ ≤ S 0 . � � p , p ′ ∈ R n × n . Then σ min ( A ∞ ) ≥ c T , H . Let A ∞ = �� ψ p , ψ p ′ �� Moreover, A ∞ is the a.s. limit of A M . Therefore, for large M, the smallest singular value of A M satisfies with a high probability that σ min ( A M ) ≥ ( 1 − ǫ ) c T , H Choose { ψ p } linearly independent in L 2 ( ρ T ) Piecewise polynomials: on a partition of support( ρ T ) Finite difference ≈ derivatives ⇒ an O (∆ t ) error to estimator 19 / 36
Implementation Approximate regression measure 1 ◮ Estimate the ρ T with large datasets ◮ Partition on support( ρ T ) Construct hypothesis space H : 2 ◮ choose the degree of piecewise polynomials ◮ set dimension of H according to sample size Regression: 3 ◮ Assemble the arrays (in parallel) ◮ Solve the normal equation 20 / 36
Recommend
More recommend