Memory Models for Incremental Learning Architectures Viktor - PowerPoint PPT Presentation

Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer

Outline ➢ Motivation ➢ Case study: Personalized Maneuver Prediction at Intersections ➢ Handling of Heterogeneous Concept Drift

Motivation ➢ Personalization − adaptation to user habits / environments ➢ Lifelong-learning

Challenges - Personalized online learning ➢ Learning from few data

Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order

Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift

Challenges - Personalized online learning ➢ Learning from few data ➢ Sequential data with predefined order ➢ Concept drift ➢ Cooperation between average and personalized model

Change is everywhere ➢ Coping with „ arbitrary “ changes

Change of taste / interest

Seasonal changes

Change of context

Rialto task: Change of lighting conditions

Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the next incoming example

Setting ➢ Supervised stream classification − Predict for an incoming stream of features x 1 , … , x j , x i ℝ n the corresponding labels y 1 , … y j , y i ∈ {1, … , c} ➢ On-line learning scheme − After each touple x i , y i generate a new model h i to predict the Preconditions for application: − Obtainable labels in retrospective next incoming example

Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍

Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 𝑢 1 Real drift 𝑄 𝑍 𝑌 changes Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

Definition ➢ Concept drift is given when the joint distribution changes ∃𝑢 0 , 𝑢 1 : 𝑄 𝑢 0 𝑌, 𝑍 ≠ 𝑄 𝑢 1 𝑌, 𝑍 𝑢 0 𝑢 1 Real drift Virtual drift 𝑄 𝑍 𝑌 changes 𝑄(𝑌) changes Image source : Gama et al. “A survey on concept drift adaptation”, ACM Computing Surveys 2014

Related work ➢ Dynamic sliding windows techniques − PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013 ➢ Ensemble methods with various weighting schemes − LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML -PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non - Stationary Environments“, IEEE -TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP -2013

Related work ➢ Dynamic sliding windows techniques − PAW Bifet et al. “Efficient Data Stream Classification Via Probabilistic Adaptive Windows“, ACM 2013 ➢ Ensemble methods with various weighting schemes − LVGB Bifet et al. “Leveraging Bagging for Evolving Data Streams“, ECML -PKDD 2010 − Learn++.NSE Elwell et al. “Incremental Learning in Non - Stationary Environments“, IEEE -TNN 2011 − DACC Jaber et al. “Online Learning: Searching for the Best Forgetting Strategy Under Concept Drift“, ICONIP -2013 ➢ Drawbacks: − Target specific drift types − Require hyperparameter setting according to the expected drift − Discard former knowledge that still may be valuable

Drawbacks

Drawbacks – Usual result

Drawbacks – Desired behavior

Self Adaptive Memory (SAM)

Self Adaptive Memory (SAM) kNN model kNN model

Self Adaptive Memory (SAM)

Moving squares dataset

STM size adaptation

STM size adaptation Error 27.12 %

STM size adaptation Error 27.12 % 13.12 %

STM size adaptation Error 27.12 % 13.12 % 7.12 %

STM size adaptation Error 27.12 % 13.12 % 7.12 % 0.0 %

Distance-based cleaning

Distance-based cleaning cleaning STM-consistent data

Distance-based cleaning STM Data to clean

Adaptive compression cleaning STM-consistent data Long Term Memory

Adaptive compression cleaning STM-consistent data Long Term Memory class-wise clustering

Prediction

Moving squares by SAM

Results: Error rates / ranks

SAM achieves best results

SAM is robust

Reasons for robustness ➢ Adaptation guided through error minimization − Dynamic size of the STM − Model selection for prediction − Reduction of hyperparameters ➢ Consistency between STM and LTM ➢ LTM acts as safety net

Memory Models for Incremental Learning Architectures Viktor - PowerPoint PPT Presentation

Memory Models for Incremental Learning Architectures Viktor Losing, Heiko Wersing and Barbara Hammer Outline Motivation Case study: Personalized Maneuver Prediction at Intersections Handling of Heterogeneous Concept Drift Motivation

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Incremental Parsing in Bounded Memory William Schuler Department of Linguistics The Ohio State

Robustness against Relaxed Memory Models Memory Models Roland Meyer Technische Universit at

Extent- -based Incremental Identification based Incremental Identification Extent of Reaction

Incremental Construction Cost Incremental Construction Cost Analysis for New Homes Robin Snyder,

Boosting the deep multidimensional long short- term memory network for handwritten recognition

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

Discrimination between genuine versus fake emotion using long-short term memory with parametric

EVALUATION OF THE MODAL MODEL OF MEMORY Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG,

Creating a long-term memory for the global DNS Mattijs Jonker Introduction Almost fjve

ARFIMA (long memory) models Christopher F Baum EC 327: Financial Econometrics Boston College,

Full OCM model for the ATC Andrea Lorenzani http://www.di.unipi.it/~lorenzan/work/FM4IS.ppt

CS1110 Nate Brunelle Today: How do computers? Questions? Last Time Paper airplanes