adaptive incremental learning for statistical relational
play

Adaptive Incremental Learning for Statistical Relational Models - PowerPoint PPT Presentation

Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting Yulong Gu and Paolo Missier Presenter: Yulong Gu School of Computing, Newcastle University UK Outline Background Relational Functional


  1. Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting Yulong Gu and Paolo Missier Presenter: Yulong Gu School of Computing, Newcastle University UK

  2. Outline • Background • Relational Functional Gradient Boosting (RFGB) • Top-down Induction of first-order logical decision trees (TILDE) • Concept-Adapting Very Fast Decision Tree (CVFDT) • Hoeffding Relational Regression Tree (HRRT) • Rule Stability Metric for CVFDT • Relational Incremental Boosting (RIB) • Relational Boosted Forest (RBF) 2

  3. Problem Supervised Learning with dataset that: • Incomplete – contains missing values • Imbalanced – #negative instances far outnumber #positive instances • Large-Scale – more cost-efficient to update than re-building model • Evolving – concept drifts • Multi-relational – objects are connected in meaningful way 3

  4. Solution System Design Data-driven Model Statistical Relational Model Data Properties: Relational Dependency Network Markov Logic Network 1. Multi-relational 2. Imblanced Relational Functional Gradient Boosting Framework 3. Large-scale 4. Evolving Relational Soft Margin Approach 5. Incomplete Adaptive Incremental Learning Structural Expectation Maximization 4

  5. Relational Functional Gradient Boosting Learn a RRT for each predicate encoding Education both dependencies and parameters Relational Regression Tree + … + + Study Hard Startup Go to Start a College Startup Company Academic Awards Profit more than N … Work at fast food joint (Y) Learn … Structure&Parameters Boosting Work at Academic fast food Awards Learn multiple weak models rather Work at Career joint (Y) fast food joint (Y) than a single complex model Start a Startup Company Want to build a statistical relational Profit more than N model out of these predicates? Study Hard Go to College Natarajan.S (2012). RFGB . Machine Learning 5

  6. Hoeffding Relational Regression Tree(HRRT) Incrementally learn Relational Regression Tree? learn a relational regression tree? TILDE Learn regression tree incrementally? CVFDT allows conjunction of predicates Learn predicate at node with fraction of • • = + HRRT extensions allow conjunctions of streaming data • recursive and aggregated predicates concept-adapting • Su ffi cient Statistics Updated Pos Example: person(Eric), distinction + 1 Fork and calculate regression value workatFFJ(Eric), startup + 0 college(Eric), … distinction(Eric) Sliding Window ? -0.2 True Split True Work at Distinction Go to Distinction fast food Distinction College joint False False 0.5 ? Only Split when Hoe ff ding Bound satisfied CVFDT Splitting Strategy 6 Blockeel, H., & De Raedt, L. (1998). TILDE . Artificial Intelligence(AI); Hulten, G.(2001). CVFDT . KDD

  7. Hoeffding Relational Regression Tree(HRRT) Hoeffding Bound: With desired confidence, the upper bound of the difference between the true mean and observed mean of • a random variable is dependent on the number of observations. Example : After update of SS, the node has seen 100 examples, with 99% certainty, the difference between the true !"#(%"&' ()*+),-+)., − %"&' 0+&1+23 ) and observed one is less than pre-defined 5 , HB satisfied, split. Pos Example: person(Eric), workatFFJ(Eric), distinction + 1 college(Eric), startup + 0 Sliding Window distinction(Eric) … -0.3 True Work at Go to fast food True Distinction College joint False 0.1 False 0.5 7

  8. Hoeffding Relational Regression Tree(HRRT) How does CVFDT adapt to concept drift ? Maintain a set of alternative subtrees for each node with different predicates than the original one • Periodically check HB at each node, if failed, then add new subtree to its subtree set with the best predicate at the • moment • Once one of the subtree outperforms the original one, the wining subtree will replace the original subtree and discard the original subtree entirely. -0.3 True Study Hard False -0.9 True 0.3 Work at Go to True fast food College joint Work at Start a False Sliding Window fast food Startup 0.5 joint Company False Sliding Window 0.9 After substitution -0.9 True Start a Startup Company False 0.9 CVFDT with alternative subtree 8

  9. Hoeffding Relational Regression Tree(HRRT) Why is CVFDT not good enough? Less Responsive - new concept will need many counter-examples to invalidate old concept • larger prediction variance – old concepts are entirely discarded based on relatively small amount of data • • Hard to maintain and analyse – one single complex model -0.3 True Study Hard False -0.9 True 0.3 Work at Go to True fast food College joint Work at Start a False Sliding Window fast food Startup 0.5 joint Company False Sliding Window 0.9 After substitution -0.9 True Start a Startup Company False 0.9 CVFDT with alternative subtree 9 Kolter, J.(2007). DWM . J. Mach. Learn. Res.

  10. Ensemble Methods for Relational Adaptive Incremental Learning Ensemble Method for Concept Drift: Boosting, Bagging, Weighted Majority... • • Train multiple weak models to represent conflicting rules. Each weak model contributes to the final prediction. • A True D Study Hard True False Work at Start a True B fast food Startup Work at joint (Y) Company Go to False fast food College joint (Y) E False C Weak Model 1 with weight 𝛽 Weak Model 2 with weight 𝛾 Boosting : Weighted Majority: 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝛽𝐵 + 𝛾𝐸, 𝛽 = 𝛾 = 1 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝛽𝐵 + 𝛾𝐸 10

  11. Rule Stability Metric Definition 1. • Define the Rule Stability of a model as 𝑜, the size of the smallest change in sample 𝐸 that may cause new rule 𝑠 3 to become superior to working rule 𝑠 . In following equation, 𝐸′ is 𝐸 after change: 𝑀𝑓𝑏𝑠𝑜𝑓𝑠 ∶ 𝐸𝑗𝑔𝑔 𝐸, 𝐸 3 = 𝑜, 𝑠 → 𝑠′ (1) When we apply the Rule stability to a tree trained with HRRT, we can prove that: • With confidence 1 − 𝜀, the size of the smallest change that may cause 𝑠′ to become superior to 𝑠 is: 𝑈𝑝𝑚𝑓𝑠𝑏𝑜𝑑𝑓 = Δ𝐻̅ B C ,B D − 𝜗 (2) • Δ𝐻̅ B C ,B D is the average of the difference between the scores of test 𝑌 G and 𝑌 H evaluated by splitting function 𝐻 𝑌 I , and 𝜗 is the parameter obtained from the Hoeffding inequality given 𝑜 and a desired confidence 𝜀 . • The 𝑈𝑝𝑚𝑓𝑠𝑏𝑜𝑑𝑓 measures the rule stability of an inner node, and we define: S 𝑈𝑠𝑓𝑓𝑈𝑝𝑚 = ∑ 𝑜𝑝𝑒𝑓 LMNOPGQRO (3) QMTOUV as the stability of the tree. • 11

  12. Established Rules Combine HRRT and Rule Stability to enable Ensemble Methods to handle Concept Drift: • When is the weak model good enough to represent current rules? It passes the rule stability check with current sliding window data and • It got boosted using current sliding window data • Functional Gradient Ascent Initial HRRT Training Example Pass Functional Gradient Function Rule Stability Check Boosting Functional Gradient Example Established Rules: We will boost an initial HRRT when it is stable so that the objective functional is best optimised for the current sliding window data and the stable rules are transformed into Functional Gradient Tree established rules. 12

  13. Relational Incremental Boosting Work at Work at Work at fast food fast food fast food joint (Y) joint (Y) joint (Y) Start a Go to Go to Startup College College Company True True True False False False Profit more C0 Distinction C1 Failed Cn than N … False True False True False True Bn An B0 A0 B1 A1 Data Stream d0 Data Stream d1 Data Stream dn Functional Gradient Tree tn Initial HRRT t0 Functional Gradient Tree t1 Pass RC Pass RC Pass RC Boost b0 + b1 … tn -> bn Boost t0 -> b0 Boost b0 + t1 -> b1 Functional Gradient of b0 Functional Gradient of b1 Functional Gradient of bn 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝐵 V + 𝐵 W + ⋯ + 𝐵 Q 13

  14. ����������������������������� ����������������� Relational Incremental Boosting Evaluation Centre for RIB Monitor global performance Strong consistence to No Concept Drift Training data over time Set S to False �������� Monitor contribution to error of each FGT Complexity Discard poorly performing FGTs over time 14

Recommend


More recommend