Adaptive Incremental Learning for Statistical Relational Models - PowerPoint PPT Presentation

Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting Yulong Gu and Paolo Missier Presenter: Yulong Gu School of Computing, Newcastle University UK

Outline • Background • Relational Functional Gradient Boosting (RFGB) • Top-down Induction of first-order logical decision trees (TILDE) • Concept-Adapting Very Fast Decision Tree (CVFDT) • Hoeffding Relational Regression Tree (HRRT) • Rule Stability Metric for CVFDT • Relational Incremental Boosting (RIB) • Relational Boosted Forest (RBF) 2

Problem Supervised Learning with dataset that: • Incomplete – contains missing values • Imbalanced – #negative instances far outnumber #positive instances • Large-Scale – more cost-efficient to update than re-building model • Evolving – concept drifts • Multi-relational – objects are connected in meaningful way 3

Solution System Design Data-driven Model Statistical Relational Model Data Properties: Relational Dependency Network Markov Logic Network 1. Multi-relational 2. Imblanced Relational Functional Gradient Boosting Framework 3. Large-scale 4. Evolving Relational Soft Margin Approach 5. Incomplete Adaptive Incremental Learning Structural Expectation Maximization 4

Relational Functional Gradient Boosting Learn a RRT for each predicate encoding Education both dependencies and parameters Relational Regression Tree + … + + Study Hard Startup Go to Start a College Startup Company Academic Awards Profit more than N … Work at fast food joint (Y) Learn … Structure&Parameters Boosting Work at Academic fast food Awards Learn multiple weak models rather Work at Career joint (Y) fast food joint (Y) than a single complex model Start a Startup Company Want to build a statistical relational Profit more than N model out of these predicates? Study Hard Go to College Natarajan.S (2012). RFGB . Machine Learning 5

Hoeffding Relational Regression Tree(HRRT) Incrementally learn Relational Regression Tree? learn a relational regression tree? TILDE Learn regression tree incrementally? CVFDT allows conjunction of predicates Learn predicate at node with fraction of • • = + HRRT extensions allow conjunctions of streaming data • recursive and aggregated predicates concept-adapting • Su ffi cient Statistics Updated Pos Example: person(Eric), distinction + 1 Fork and calculate regression value workatFFJ(Eric), startup + 0 college(Eric), … distinction(Eric) Sliding Window ? -0.2 True Split True Work at Distinction Go to Distinction fast food Distinction College joint False False 0.5 ? Only Split when Hoe ff ding Bound satisfied CVFDT Splitting Strategy 6 Blockeel, H., & De Raedt, L. (1998). TILDE . Artificial Intelligence(AI); Hulten, G.(2001). CVFDT . KDD

Hoeffding Relational Regression Tree(HRRT) Hoeffding Bound: With desired confidence, the upper bound of the difference between the true mean and observed mean of • a random variable is dependent on the number of observations. Example : After update of SS, the node has seen 100 examples, with 99% certainty, the difference between the true !"#(%"&' ()*+),-+)., − %"&' 0+&1+23 ) and observed one is less than pre-defined 5 , HB satisfied, split. Pos Example: person(Eric), workatFFJ(Eric), distinction + 1 college(Eric), startup + 0 Sliding Window distinction(Eric) … -0.3 True Work at Go to fast food True Distinction College joint False 0.1 False 0.5 7

Hoeffding Relational Regression Tree(HRRT) How does CVFDT adapt to concept drift ? Maintain a set of alternative subtrees for each node with different predicates than the original one • Periodically check HB at each node, if failed, then add new subtree to its subtree set with the best predicate at the • moment • Once one of the subtree outperforms the original one, the wining subtree will replace the original subtree and discard the original subtree entirely. -0.3 True Study Hard False -0.9 True 0.3 Work at Go to True fast food College joint Work at Start a False Sliding Window fast food Startup 0.5 joint Company False Sliding Window 0.9 After substitution -0.9 True Start a Startup Company False 0.9 CVFDT with alternative subtree 8

Hoeffding Relational Regression Tree(HRRT) Why is CVFDT not good enough? Less Responsive - new concept will need many counter-examples to invalidate old concept • larger prediction variance – old concepts are entirely discarded based on relatively small amount of data • • Hard to maintain and analyse – one single complex model -0.3 True Study Hard False -0.9 True 0.3 Work at Go to True fast food College joint Work at Start a False Sliding Window fast food Startup 0.5 joint Company False Sliding Window 0.9 After substitution -0.9 True Start a Startup Company False 0.9 CVFDT with alternative subtree 9 Kolter, J.(2007). DWM . J. Mach. Learn. Res.

Ensemble Methods for Relational Adaptive Incremental Learning Ensemble Method for Concept Drift: Boosting, Bagging, Weighted Majority... • • Train multiple weak models to represent conflicting rules. Each weak model contributes to the final prediction. • A True D Study Hard True False Work at Start a True B fast food Startup Work at joint (Y) Company Go to False fast food College joint (Y) E False C Weak Model 1 with weight 𝛽 Weak Model 2 with weight 𝛾 Boosting : Weighted Majority: 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝛽𝐵 + 𝛾𝐸, 𝛽 = 𝛾 = 1 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝛽𝐵 + 𝛾𝐸 10

Rule Stability Metric Definition 1. • Define the Rule Stability of a model as 𝑜, the size of the smallest change in sample 𝐸 that may cause new rule 𝑠 3 to become superior to working rule 𝑠 . In following equation, 𝐸′ is 𝐸 after change: 𝑀𝑓𝑏𝑠𝑜𝑓𝑠 ∶ 𝐸𝑗𝑔𝑔 𝐸, 𝐸 3 = 𝑜, 𝑠 → 𝑠′ (1) When we apply the Rule stability to a tree trained with HRRT, we can prove that: • With confidence 1 − 𝜀, the size of the smallest change that may cause 𝑠′ to become superior to 𝑠 is: 𝑈𝑝𝑚𝑓𝑠𝑏𝑜𝑑𝑓 = Δ𝐻̅ B C ,B D − 𝜗 (2) • Δ𝐻̅ B C ,B D is the average of the difference between the scores of test 𝑌 G and 𝑌 H evaluated by splitting function 𝐻 𝑌 I , and 𝜗 is the parameter obtained from the Hoeffding inequality given 𝑜 and a desired confidence 𝜀 . • The 𝑈𝑝𝑚𝑓𝑠𝑏𝑜𝑑𝑓 measures the rule stability of an inner node, and we define: S 𝑈𝑠𝑓𝑓𝑈𝑝𝑚 = ∑ 𝑜𝑝𝑒𝑓 LMNOPGQRO (3) QMTOUV as the stability of the tree. • 11

Established Rules Combine HRRT and Rule Stability to enable Ensemble Methods to handle Concept Drift: • When is the weak model good enough to represent current rules? It passes the rule stability check with current sliding window data and • It got boosted using current sliding window data • Functional Gradient Ascent Initial HRRT Training Example Pass Functional Gradient Function Rule Stability Check Boosting Functional Gradient Example Established Rules: We will boost an initial HRRT when it is stable so that the objective functional is best optimised for the current sliding window data and the stable rules are transformed into Functional Gradient Tree established rules. 12

Relational Incremental Boosting Work at Work at Work at fast food fast food fast food joint (Y) joint (Y) joint (Y) Start a Go to Go to Startup College College Company True True True False False False Profit more C0 Distinction C1 Failed Cn than N … False True False True False True Bn An B0 A0 B1 A1 Data Stream d0 Data Stream d1 Data Stream dn Functional Gradient Tree tn Initial HRRT t0 Functional Gradient Tree t1 Pass RC Pass RC Pass RC Boost b0 + b1 … tn -> bn Boost t0 -> b0 Boost b0 + t1 -> b1 Functional Gradient of b0 Functional Gradient of b1 Functional Gradient of bn 𝑄 𝑍 = 𝑈𝑠𝑣𝑓 𝑄𝑏 𝑍 = 𝐵 V + 𝐵 W + ⋯ + 𝐵 Q 13

�� Relational Incremental Boosting Evaluation Centre for RIB Monitor global performance Strong consistence to No Concept Drift Training data over time Set S to False �� Monitor contribution to error of each FGT Complexity Discard poorly performing FGTs over time 14

Adaptive Incremental Learning for Statistical Relational Models - PowerPoint PPT Presentation

Adaptive Incremental Learning for Statistical Relational Models Using Gradient-Based Boosting Yulong Gu and Paolo Missier Presenter: Yulong Gu School of Computing, Newcastle University UK Outline Background Relational Functional

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Goals Why relational learning? Review of logic programming Examples for

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Calculus More declarative than relational algebra Foundation for query

RELATIONAL ALGEBRA CHAPTER 6 1 CHAPTER 6 OUTLINE Unary Relational Operations: SELECT and

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

This Lecture The Relational Model Relational data structures Relations and Relational

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Lecture 17: Boosting CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

Welcome Annual Graduate Student HR Representatives Meeting Thank you Michael Walker, Assistant

Tacoma Narrows and the Gradient Vector Ken Huffman

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein

A High Resolution Vertical Gradient Approach for Delineation of Hydrogeologic Units at a

Modeling Velocity Gradients in an OBC, First-Break Positioning Algorithm Noel Zinn Western