Nonparametric inference of interaction laws in particle/agent systems Fei Lu Department of Mathematics, Johns Hopkins University Joint with: Mauro Maggioni, Sui Tang and Ming Zhong February 13, 2019 CSCAMM Seminar, UMD
Motivation Q: What is the law of interaction between particles/agents? 2 / 34
Motivation Q: What is the law of interaction between particles/agents? N � x i ( t ) + 1 m ¨ x i ( t ) = − ν ˙ K ( x i , x j ) , N j = 1 , j � = i Newton’s law of gravitation: K ( x , y ) = Gm 1 m 2 , r = | x − y | r 2 Molecular fluid: K ( x , y ) = ∇ x [Φ( | x − y | )] Lennard-Jones potential: Φ( r ) = c 1 r 12 − c 2 r 6 . flocking birds/school of fish K ( x , y ) = φ ( | x − y | ) ψ ( � x , y � ) opinion/voter models, bacteria models ... a a (1) Cucker+Smale: On the mathematics of emergence. 2007. (2) Vic- sek+Zafeiris: Collective motion. 2012. (3) Mostch+Tadmor: Heterophilious Dy- namics Enhances Consensus. 2014 ... 3 / 34
An inference problem: Infer the rule of interaction in the system N � x i ( t )+ 1 i = 1 , · · · , N , x i ( t ) ∈ R d m ¨ x i ( t ) = − ν ˙ K ( x i − x j ) , N j = 1 , j � = i from observations of trajectories. x i is the position of the i-th particle/agent Data: many independent trajectories { x j ( t ) : t ∈ T } M j = 1 Goal : infer φ in K ( x ) = −∇ Φ( | x | ) = − φ ( | x | ) x m = 0 ⇒ a first order system 4 / 34
N � x i ( t ) = 1 ˙ φ true ( | x i − x j | )( x j − x i ) := [ f φ ( x ( t ))] i N j = 1 , j � = i Least squares regression: with H n = span { e i } n i = 1 , M � x m − f φ ( x m ) � 2 ˆ � ˙ φ n = arg min E M ( φ ) := φ ∈H n m = 1 How to choose the hypothesis space H n ? Inverse problem well-posed/ identifiability? Consistency and rate of “convergence”? 5 / 34
Outline Learning via nonparametric regression: 1 ◮ A regression measure and function space ◮ Identifiability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 2 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Open problems 3 6 / 34
Learning via nonparametric regression The dynamical system: N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) := [ f φ ( x ( t ))] i N j = 1 , j � = i Admissible set ( ≈ globally Lipschitz): K R , S := { ϕ ∈ W 1 , ∞ : supp ϕ ∈ [ 0 , R ] , sup [ | ϕ ( r ) | + | ϕ ′ ( r ) | ] ≤ S } r ∈ [ 0 , R ] Data: M -trajectories { x m ( t ) : t ∈ T } M m = 1 i . i . d x m ( 0 ) ∼ µ 0 ∈ P ( R dN ) T = [ 0 , T ] or { t 1 , · · · , t L } with ˙ x ( t i ) Goal: nonparametric inference 1 of φ 1 (1) Bongini, Fornasier, Hansen, Maggioni: Inferring Interaction Rules for mean field equations, M3AS, 2017. (2) Binev, Cohen, Dahmen, Devore and Temlyakov: Universal Algorithms for learning theory, JMLR 2005. (3) Cucker, Smale: On the mathematical foundation of learning. Bulletin of AMS, 2001. 7 / 34
L , M � 1 ˙ ˆ � f φ ( X m ( t l )) − X m ( t l ) � 2 φ M , H = arg min E M ( φ ) := ML φ ∈H l , m = 1 E M ( φ ) is quadratic in φ , and E M ( φ ) ≥ E M ( φ true ) = 0 The minimizer exists for any H = H n = span { φ 1 , . . . , φ n } M →∞ Agenda E M ( · ) E ∞ ( · ) a function space with metric dist ( � φ, φ true ) ; ? M →∞ � � φ M , H φ ∞ , H Learnability: ? ? dist ( H ,φ true ) → 0 ◮ Convergence of estimators? ◮ Convergence rate? φ true 8 / 34
Review of classical nonparametric regression: Estimate φ ( z ) = E [ Y | Z = z ] : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 2 s + 1 , 1 s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D 9 / 34
Review of classical nonparametric regression: Estimate φ ( z ) = E [ Y | Z = z ] : R D → R from data { z i , y i } M m = 1 . { z i , y j } are iid samples; E M ( f ) := � M ˆ m = 1 � y i − f ( z i ) � 2 φ n := arg min f ∈H n Optimal rate: if dist ( H n , φ true ) � n − s and n ∗ = ( M / log M ) 2 s + 1 , 1 s � ˆ φ n ∗ − φ � L 2 ( ρ Z ) � M − 2 s + D Our learning of kernel φ : R + → R from data { x m ( t ) } N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) N j = 1 , j � = i { r m ijt := | x m i ( t ) − x m j ( t ) |} not iid The values of φ ( r m ijt ) unknown 10 / 34
Regression measure Distribution of pairwise-distances ρ : R + → R L , N � 1 ρ T ( r ) = E µ 0 δ r ii ′ ( t l ) ( r ) � N � L 2 l , i , i ′ = 1 , i < i ′ M →∞ unknown, estimated by empirical distribution ρ M − − − − → ρ T (LLN) T intrinsic to the dynamics Regression function space L 2 ( ρ T ) the admissible set ⊂ L 2 ( ρ T ) H = piecewise polynomials ⊂ L 2 ( ρ T ) singular kernels ⊂ L 2 ( ρ T ) 11 / 34
M →∞ E M ( · ) E ∞ ( · ) Identifiability: a coercivity condition ? M →∞ � � ˆ φ M , H φ ∞ , H φ M , H = arg min E M ( φ ) ? φ ∈H ? φ true � T 1 E ∞ (ˆ φ − φ ( X ( t )) � 2 dt ≥ c � (ˆ φ − φ )( · ) · � 2 φ ) − E ∞ ( φ ) = E µ 0 � f ˆ L 2 ( ρ T ) NT 0 Coercivity condition. There exists c T > 0 s.t. for all ϕ ( · ) · ∈ L 2 ( ρ T ) � T 1 E µ 0 � f ϕ ( x ( t )) � 2 dt = �� ϕ, ϕ �� ≥ c T � ϕ ( · ) · � 2 L 2 ( ρ T ) NT 0 � T 1 coercivity: bilinear functional �� ϕ, ψ �� := 0 E µ 0 � f ϕ , f ψ � ( x ( t )) dt NT controls condition number of regression matrix 12 / 34
Consistency of estimator Theorem (L. Maggioni, Tang, Zhong) Assume the coercivity condition. Let {H n } be a sequence of compact convex subsets of L ∞ ([ 0 , R ]) such that inf ϕ ∈H n � ϕ − φ true � ∞ → 0 as n → ∞ . Then M →∞ � � n →∞ lim lim φ M , H n ( · ) · − φ true ( · ) · � L 2 ( ρ T ) = 0 , almost surely . For each n , compactness of { � φ M , H n } and coercivity implies that φ M , H n → � � φ ∞ , H n in L 2 Increasing H n and coercivity implies consistency. In general, truncation to make H n compact 13 / 34
Optimal rate of convergence Theorem (L. Maggioni, Tang, Zhong) Let {H n } be a seq. of compact convex subspaces of L ∞ [ 0 , R ] s.t. ϕ ∈H n � ϕ − φ true � ∞ ≤ c 1 n − s . dim ( H n ) ≤ c 0 n , and inf 1 2 s + 1 : then Assume the coercivity condition. Choose n ∗ = ( M / log M ) � log M � s 2 s + 1 E µ 0 [ � � φ T , M , H n ∗ ( · ) · − φ true ( · ) · � L 2 ( ρ T ) ] ≤ C . M The 2nd condition is about regularity: φ ∈ C s Choose H n according to s and M 14 / 34
Prediction of future evolution Theorem (L. Maggioni, Tang, Zhong) Denote by � X ( t ) and X ( t ) the solutions of the systems with kernels � φ and φ respectively, starting from the same initial conditions that are drawn i.i.d from µ 0 . Then we have √ � � N � � X ( t ) − X ( t ) � 2 ] � φ ( · ) · − φ ( · ) · � 2 E µ 0 [ sup L 2 ( ρ T ) , t ∈ [ 0 , T ] Follows from Grownwall’s inequality 15 / 34
Outline Learning via nonparametric regression: 1 ◮ A regression measure and function space ◮ Learnability: a coercivity condition ◮ Consistency and rate of convergence Numerical examples 2 ◮ A general algorithm ◮ Lennard-Jones model ◮ Opinion dynamics and multiple-agent systems Open problems 3 16 / 34
Numerical examples The regression algorithm � � 2 L , M , N � � N � � 1 1 � � x ( m ) N ϕ ( r m i , i ′ ( t l )) r m � ˙ E M ( ϕ ) = ( t l ) − i , i ′ ( t l ) , � � i LMN � l , m , i = 1 i ′ = 1 n � a p ψ p ( r ) : a = ( a 1 , . . . , a n ) ∈ R n } , H n := { ϕ = p = 1 M � E L , M ( ϕ ) = E L , M ( a ) = 1 � d m − Ψ m L a � 2 R LNd . M m = 1 M M � � 1 L a = 1 A m b m L , rewrite as A M a = b M M M m = 1 m = 1 can be computed parallelly Caution: choice of { ψ p } affects condi( A M ) 17 / 34
Assume coercivity condition: �� ϕ, ϕ �� ≥ c T � ϕ ( · ) · � 2 L 2 ( ρ T ) . Proposition (Lower bound on smallest singular value of A M ) Let { ψ 1 , · · · , ψ n } be a basis of H n s.t. � ψ p ( · ) · , ψ p ′ ( · ) ·� L 2 ( ρ L T ) = δ p , p ′ , � ψ p � ∞ ≤ S 0 . � � p , p ′ ∈ R n × n . Then σ min ( A ∞ ) ≥ c L . Let A ∞ = �� ψ p , ψ p ′ �� Moreover, A ∞ is the a.s. limit of A M . Therefore, for large M, the smallest singular value of A M satisfies with a high probability that σ min ( A M ) ≥ ( 1 − ǫ ) c L Choose { ψ p ( · ) ·} linearly independent in L 2 ( ρ T ) Piecewise polynomials: on a partition of support( ρ T ) Finite difference ≈ derivatives ⇒ an O (∆ t ) error to estimator 18 / 34
Implementation Approximate regression measure 1 ◮ Estimate the ρ T with large datasets ◮ Partition on support( ρ T ) Construct hypothesis space H : 2 ◮ degree of piecewise polynomials ◮ set dimension of H according to sample size Regression: 3 ◮ Assemble the arrays (in parallel) ◮ Solve the normal equation 19 / 34
Examples: Lennard-Jones Dynamics The Lennard-Jones potential �� σ � 6 � � 12 � σ V LJ ( r ) = 4 ǫ − ⇒ φ ( r ) r = V LJ ′ ( r ) r r N � x i ( t ) = 1 ˙ φ ( | x i − x j | )( x j − x i ) N j = 1 , j � = i time0.010 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 20 / 34
Recommend
More recommend