Learning Temporal Point Processes via Reinforcement Learning Shuang Li 1 , Shuai Xiao 2 , Shixiang Zhu 1 , Nan Du 3 , Yao Xie 1 , Le Song 1,2 1 Georgia Institute of Technology 2 Ant Financial 3 Google Brain
Motivation 1:30 pm 1:18 pm 1:00 pm Dinner together? Funny joke Cool picture … ( " ∈ 0 ∪ , - ! time David " # " $ " % " ? History & ' u Event data : tweets/retweets, crime events, earthquakes, patient visits to hospital, finance transactions, …. u Learn temporal pattern of event data. u Event time is random u Complex dependency structure
� Point Process Model Intensity function u 𝜇 𝑢|ℋ + 𝑒𝑢 = 𝔽 𝑂[𝑢, 𝑢 + 𝑒𝑢)]|ℋ + where 𝑂[𝑢, 𝑢 + 𝑒𝑢) is the number of events falling in the set [𝑢, 𝑢 + 𝑒𝑢) . Point Process Temporal Pattern 𝝁 𝜾 𝒖|𝓘 𝒖 Poisson constant ! time David Da " & " # " $ " % Inhomogeneous 𝜇 4 (𝑢) Poisson ! time David Da " # " $ " % " & " ' " # " $ " % " & " ' Hawkes 𝜈 + 𝛽 8 exp −|𝑢 − 𝑢 = | ! time + ? ∈𝓘 A Da David " # " $ " %
� � Traditional Maximum-Likelihood Framework u Model conditional intensity 𝜇 4 𝑢|ℋ + Model- as a parametric/non-parametric form. misspecification! u Learn model by maximizing likelihood 𝑄 𝑢 C , 𝑢 D , … , 𝑢 F = exp − G 𝜇 4 𝑢 ℋ + 𝑒𝑢 J 𝜇 4 (𝑢 = |ℋ + ) H,I = " $ " # " % " & " ' ! time Da David " # " % " $
New Reinforcement Learning Framework Learn policy u 𝜌 4 𝑏 𝑡 + = 𝑞 𝑢 = 𝑢 =OC, … , 𝑢 C ) where 𝑏 ∈ R Q is the next event time, to maximize cumulative reward . Learn reward u 𝑠(𝑏) to guide policy to imitate observed event data (expert). ' ( (!|+ , ) : LSTM ℎ $ ℎ " ℎ % ℎ & ! " ! % ! & imitate .(! " ) .(! % ) .(! & ) density " ∗ $ ≔ "($|( ) ) ! time David id $ + $ , $ - $ History ( )
� � Optimal Reward Inverse Reinforcement Learning u 𝑠 ∗ = max ( 𝔽 XYZX[\ 8 𝑠 𝑢 = − max ] ^ 𝔽 ] ^ 8 𝑠 𝑏 = ) V∈ℱ = = Choose 𝑠 ∈ ℱ be unit ball in Reproducing Kernel Hilbert Space u (RKHS). We obtain analytical optimal reward Given L expert trajectories, M generated trajectories by policy u 𝑠̂ ∗ 𝑏 ∝ 1 f 1 i (d) , 𝑏 − (h) , 𝑏 𝑀 8 8 𝑙 𝑢 = 𝑁 8 8 𝑙 𝑏 = deC =eC heC =eC mean embedding of mean embedding of expert intensity function policy intensity function where 𝑙 𝑢, 𝑢′ is a universal RKHS kernel.
Modeling Framework update optimal reward / ∗ (& + )/ ∗ (& , ) / ∗ (& - ) / ∗ (& . ) " # (&|( ) ) & + & , & - & . 0 Policy Gradient 1 + 1 , 1 - expert
Numerical Results Our method: RLPP u Baselines: u State-of-the-art methods: RMTPP(Du et al. 2016 KDD ) WGANTPP (Xiao et al. 2017 NIPS) u Parametric baselines: Inhomogeneous Poisson (IP), Hawkes (SE), Self-correcting (SC) u Comparison of learned empirical intensity u 4 Methods Real RLPP WGAN IP SE RMTPP SC Methods Real RLPP WGAN IP SE RMTPP SC Methods Real RLPP WGAN IP SE RMTPP SC 6 7.5 3 4 Intensity Intensity Intensity 5.0 2 2 2.5 1 0 0 0.0 4 8 12 4 8 12 4 8 12 Time Index Time Index Time Index Comparison of runtime u Method RLPP WGANTPP RMTPP SE SC IP Time 80 m 1560m 60m 2m 2m 2m Ratio 40x 780x 30x 1x 1x 1x
Poster u Tue Dec 4th 05:00 -- 07:00 PM u @ Room 210 & 230 AB # 124
Recommend
More recommend