Anticipating Concept Drift in Online Learning Micha� l Derezi´ nski (speaker), Badri Narayan Bhaskar Online setting: predict with � θ t ∈ Θ, get loss f t ( � θ t ). Tracking Regret: compare losses to a good sequence θ t : R T ( θ ) = � f t ( � θ t ) − � f t ( θ t ) . (Incurred Regret) ∝ (Variability of θ t ) What if the drift trajectory of θ t can be anticipated? t Use a window of past predictions � θ t − k to get a drift estimate.
Linear Drift Model 7 6 AMGD 5 2 3 4 5 6 1 4 Comparator 7 6 5 GD 4 t Φ( θ ; � � 3 Drift estimation: θ t − k ) � �� � +1 θ t +1 � � � θ t − η t ∇ f t ( � k ( � θ t − � AMGD: θ t ) θ t − k ) 2 θ (1) θ (2) Two-Track: without drift � and with drift � t t 1 1 θ (1) θ (2) θ t = (1 − w t ) � � + w t � Final prediction: t t
Simulations: Trajectories 30 Comparator GD 20 TTND TTMP AMGD 10 0 −10 −20 −30 0 100 200 300 400 500
Simulations: Losses 2 10 GD TTND TTMP AMGD 1 10 Loss 0 10 −1 10 0 50 100 150 200 250 Time
Analysis Regret bounds for Two-Track: √ Φ ( θ ) � � T t =1 � θ t +1 − � O ( T (1 + V � Φ ( θ ))) where V � Φ t ( θ t ) � How good is the drift estimation � Φ? We show bounds for V � Φ ( θ ) against the optimal Φ ∗ . Can we prove linear convergence observed in simulations? We prove convergence for a special case of AMGD.
Many Open Questions ◮ Full regret and convergence analysis . ◮ Generalization to nonlinear drift models. ◮ How to select the learning rate η ? ◮ What if we use time-stamps instead of index t ? ◮ How to best avoid instability in AMGD?
Recommend
More recommend