memory models for incremental learning
play

Memory Models for Incremental Learning Architectures Viktor - PowerPoint PPT Presentation

Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer Outline Motivation Case study: Personalized Maneuver Prediction at Intersections Handling of Heterogeneous Concept Drift Motivation


  1. Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer

  2. Outline ➢ Motivation ➢ Case study: Personalized Maneuver Prediction at Intersections ➢ Handling of Heterogeneous Concept Drift

  3. Motivation ➢ Personalization − adaptation to user habits / environments ➢ Lifelong-learning

  4. Challenges - Personalized online learning ➢ Learning from few data

  5. Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order

  6. Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift

  7. Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift ➢ Cooperation between average and personalized model

  8. Change is everywhere ➢ Coping with „ arbitrary “ changes

  9. Change of taste / interest

  10. Seasonal changes

  11. Change of context

  12. Rialto task: Change of lighting conditions

  13. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

  14. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

  15. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

  16. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

  17. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

  18. Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the Preconditions for application: − Obtainable labels in retrospective next incoming example

  19. Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍

  20. Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

  21. Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 𝑢 1 Real drift 𝑄 𝑍 𝑌 changes Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

  22. Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 𝑢 1 Real drift Virtual drift 𝑄 𝑍 𝑌 changes 𝑄(𝑌) changes Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

  23. Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 𝑢 1 Real drift Virtual drift 𝑄 𝑍 𝑌 changes 𝑄(𝑌) changes Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

  24. Related work ➢ Dynamic sliding windows techniques − PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013 ➢ Ensemble methods with various weighting schemes − LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML -PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non - Stationary Environments“, IEEE -TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP -2013

  25. Related work ➢ Dynamic sliding windows techniques − PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013 ➢ Ensemble methods with various weighting schemes − LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML -PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non - Stationary Environments“, IEEE -TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP -2013 ➢ Drawbacks: − Target specific drift types − Require hyperparameter setting according to the expected drift − Discard former knowledge that still may be valuable

  26. Drawbacks

  27. Drawbacks – Usual result

  28. Drawbacks – Desired behavior

  29. Drawbacks – Desired behavior

  30. Drawbacks – Desired behavior

  31. Self Adaptive Memory (SAM)

  32. Self Adaptive Memory (SAM) kNN model kNN model

  33. Self Adaptive Memory (SAM)

  34. Moving squares dataset

  35. STM size adaptation

  36. STM size adaptation Error 27.12 %

  37. STM size adaptation Error 27.12 % 13.12 %

  38. STM size adaptation Error 27.12 % 13.12 % 7.12 %

  39. STM size adaptation Error 27.12 % 13.12 % 7.12 % 0.0 %

  40. STM size adaptation Error 27.12 % 13.12 % 7.12 % 0.0 %

  41. Distance-based cleaning

  42. Distance-based cleaning cleaning STM-consistent data

  43. Distance-based cleaning STM Data to clean

  44. Distance-based cleaning STM Data to clean

  45. Distance-based cleaning STM Data to clean

  46. Distance-based cleaning STM Data to clean

  47. Distance-based cleaning STM Data to clean

  48. Distance-based cleaning STM Data to clean

  49. Adaptive compression cleaning STM-consistent data Long Term Memory

  50. Adaptive compression cleaning STM-consistent data Long Term Memory class-wise clustering

  51. Prediction

  52. Prediction

  53. Prediction

  54. Prediction

  55. Moving squares by SAM

  56. Results: Error rates / ranks

  57. SAM achieves best results

  58. SAM is robust

  59. Reasons for robustness ➢ Adaptation guided through error minimization − Dynamic size of the STM − Model selection for prediction − Reduction of hyperparameters ➢ Consistency between STM and LTM ➢ LTM acts as safety net

  60. Q & A

Recommend


More recommend