Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. Maryland, College Park Ben London Lise Getoor U. California, Santa Cruz BIRS Workshop: New Perspec*ves for Rela*onal Learning 4/23/2015
Real-world problems…
…benefit from rela*onal models
Collabora*ve Filtering Likes ( U1 , M1 ) ∧ Items Users Friends ( U1 , U2 ) → Likes ( U2 , M1 ) Genre Genre ( M1 , G ) ∧ Friends Genre ( M2 , G ) ∧ Likes ( U1 , M1 ) → Likes ( U1 , M2 )
Link Predic*on Coworkers ( U1 , C ) ∧ Coworkers ( U2 , C ) ∧ → Coworkers ( U1 , U2 )
Knowledge Graph Iden*fica*on Rel ( R , E1 , T ) ∧ Genre Rel ( R , E2 , T ) ∧ Label ( E1 , L ) ∧ → Label ( E2 , L ) Ar*st MutEx ( L1 , L2 ) ∧ Label ( E , L1 ) ∧ → ¬ Label ( E , L2 ) (Jiang et al., ICDM12; Pujara et al., ISWC13)
…benefit from rela*onal models
Real-world problems are big! Millions of users, Millions of users, thousands movies thousands of genes Millions of facts, thousands of Millions of users ontological constraints
What happens when? A user rates a new A new gene=c similarity movie? is discovered? New facts are extracted from the New user links form? Web?
What happens when? A user rates a new A new gene=c similarity movie? is discovered? Repeat Inference! New facts are extracted from the New user links form? Web?
Why can’t we repeat inference? • We want rich, collec*ve models! • But, 10M-1B factors = 1-100s hours * • Ideal: Inference *me balances update cycle • Insanity is doing the same thing over and over…
Online Collec*ve Inference PROBLEM SETTING
Key Problem • Real-world problems -> large graphical models • Changing evidence -> repeat inference
Key Problem • Real-world problems -> large graphical models • Changing evidence -> repeat inference • What happens when par*ally upda*ng inference? • Can we scalably approximate the MAP state without recompu*ng inference?
Generic Answer: NO! • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for
Generic Answer: NO! Full Inference • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for
Generic Answer: NO! Full Inference • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for
Previous Work • Belief Revision – e.g. Gardenfors, 1992 • Bayesian Network Updates – e.g. Bun*ne, 1991; Friedman & Goldszmidt, 1997 • Dynamic / Sequen*al Models – e.g. Murphy, 2002 / Fine et al., 1998 • Adap*ve Inference – e.g. Acar et al., 2008 • BP Message Passing – e.g. Nath & Domingos, 2010 • Collec*ve Stability – e.g. London et al., 2013
Problem Seing • Fixed model: dependencies & weights known • Online: changing evidence or observa*ons • Closed world: all variables iden*fied • Budget: infer only m variables in each epoch • Strongly-convex inference objec=ve (e.g. PSL) Ques*ons: • What guarantees can we offer? • Which m variables should we infer?
Approach • Define “regret” for online collec*ve inference • Introduce regret bounds for strongly convex inference objec*ves (like PSL!) • Develop algorithms to ac%vate a subset of the variables during inference, given budget
Online Collec*ve Inference REGRET BOUNDS
Inference Regret • General inference problem: es*mate P ( Y | X ) • In online collec*ve inference: fix , infer Y S Y S • Regret (learning): captures distance to op*mal ? • Regret (inference): the distance between the full inference result and the par*al inference update (when condi*oning on ) Y S
Defining Regret • Regret: distance between full & approximate inference w ) , 1 R n ( x , y S ; ˙ n k h ( x ; ˙ w ) � h ( x , y S ; ˙ w ) k 1 where Prior weight w · f ( x , y ) + w p 2 k y k 2 h ( x ; ˙ w ) = arg min 2 . y
Regret Bound s ! B k w k 2 R n ( x , y S ; ˙ k y S � ˆ w ) O y S k 1 n · w p Regret Ingredients: Lipschitz constant Key Takeaway: 2-norm of model weights Regret depends on L 1 Weight of L 2 prior distance between L 1 distance fixed variables and fixed variables & values in full inference their “true” values in the MAP state
Valida*ng Regret Bounds 0.25 scaled regret bound HighLocal Balanced 0.2 HighRelational inference regret 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 # epochs Measure regret of no updates versus full inference, varying the importance of rela*onal features
Online Collec*ve Inference ACTIVATION ALGORITHMS
Which variables to fix? • Knapsack: combinatorial, regrets/costs, budget • Theory: fix variables that won’t change • Prac*ce: how can we know what will change? • Idea: Can we use features of past inferences? • Explore op*miza*on (case study ADMM & PSL)
ADMM Inference in PSL (Boyd et al., 2011; Bach et al. 2012) f 1 y 1 f 2 Variables Poten*als y 2 f 3 y 3 f 4
ADMM Inference in PSL y 11 f 1 y 12 Variable Copies f 2 y 22 Poten*als y 23 f 3 y 33 f 4 y 34
ADMM Inference in PSL y 11 f 1 y 1 y 12 Consensus Es*mates f 2 y 22 Poten*als y 2 y 23 f 3 y 33 y 3 f 4 y 34
ADMM Inference in PSL y 11 α 11 f 1 y 1 α 12 y 12 Consensus Es*mates f 2 α 22 y 22 Poten*als y 2 α 23 y 23 f 3 y 33 α 33 y 3 α 34 f 4 y 34
ADMM Features � � y g − y g + 1 y g )+ ρ � 2 � min w g f g ( x , ˜ � ˜ ρ α g � � 2 ˜ y g � 2 • Weight: how important is the poten*al? • Poten*al: what loss do we incur? • Consensus: what is the variable’s value? • Lagrange Mul*plier: how much disagreement is there across poten*als?
Two heuris*cs for ac*va*on • Truth-Value: Variable value near 0.5 • Weighted Lagrangian: rule weight x Lagrange mul*pliers high
Using Model Structure • Variable dependencies marer! • Perform BFS, star*ng with new evidence • Use heuris*cs + decay to priori*ze explora*on
EXPERIMENTAL EVALUATION
Two Online Inference Tasks • Collec*ve Classifica*on (Synthe*c) – Infer arributes of users in a social network as progressively more informa*on is shared • Collabora*ve Filtering (Jester; Goldberg et al. 2001) – Infer user ra*ngs of jokes as users provide ra*ngs for an increasing number of jokes
Two Online Inference Tasks • Collec*ve Classifica*on (Synthe*c) • 100 total trials (10 networks x 10 series) • Network evolves from 10% to 60% observed • Fix 50% of variables at each epoch • Collabora*ve Filtering (Jester) • 10 trials, 100 users, 100 jokes • Evolve from 25% to 75% revealed ra*ngs • Fix {25,50,75}% of variables at each epoch
Collec*ve Classifica*on: Approximate Inference 0.44 0.16 Do Nothing Random 50% Value 50% 0.14 0.42 WLM 50% Relational 50% 0.12 0.4 inference regret 0.1 0.38 MAE 0.08 0.36 0.06 0.34 0.04 Full Inference 0.32 Random 50% 0.02 Value 50% WLM 50% Relational 50% 0 0.3 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 # epochs # epochs error vs. epochs regret vs. epochs • Regret diminishes over *me • Error decreases, approaching full inference • 69% reduc*on in inference *me
Collabora*ve Filtering 0.05 0.05 0.045 0.045 0.04 0.04 0.035 Regret 0.035 inference regret inference regret 0.03 0.03 0.025 0.025 0.02 0.02 0.015 0.015 0.01 0.01 Do Nothing Random 25% Value 25% 0.005 0.005 WLM 25% Relational 25% 0 0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed 50% ac*vated Epochs 25% ac*vated 0.2365 0.236 0.236 0.235 RMSE 0.2355 0.234 RMSE RMSE 0.235 0.233 0.2345 0.232 0.234 Full Inference 0.231 Random 25% Value 25% WLM 25% Relational 25% 0.2335 0.23 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed
Collabora*ve Filtering 0.05 • Value: high regret, but lower 0.045 error than full inference 0.04 0.035 inference regret • Preserves polarized ra*ngs 0.03 0.025 • 66% reduc*on in *me for 0.02 0.015 approximate inference 0.01 0.005 0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed 50% ac*vated Epochs 25% ac*vated 0.2365 0.236 0.236 0.235 RMSE 0.2355 0.234 RMSE RMSE 0.235 0.233 0.2345 0.232 0.234 Full Inference 0.231 Random 25% Value 25% WLM 25% Relational 25% 0.2335 0.23 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed
Online Collec*ve Inference CONCLUSION
Summary • Extremely relevant to modern problems • Necessity: approximate MAP state in PGMs • Inference regret: bound approxima*on error • Approx. algos: use op*miza*on features • Results: low regret, low error, faster • New possibili*es: rich models, fast inference
Future Work • Berer bounds for approximate inference? • Dealing with changing models/weights • Explicitly modeling change in models • Applica*ons: – Drug targe*ng – Knowledge Graph construc*on – Context-aware mobile devices
Recommend
More recommend