learning decision trees adaptively from data streams with
play

Learning Decision Trees Adaptively from Data Streams with Time Drift - PowerPoint PPT Presentation

Introduction ADWIN-DT Decision Tree Experiments Conclusions Learning Decision Trees Adaptively from Data Streams with Time Drift Albert Bifet and Ricard Gavald LARCA: Laboratori dAlgorsmica Relacional, Complexitat i Aprenentatge


  1. Introduction ADWIN-DT Decision Tree Experiments Conclusions Learning Decision Trees Adaptively from Data Streams with Time Drift Albert Bifet and Ricard Gavaldà LARCA: Laboratori d’Algorísmica Relacional, Complexitat i Aprenentatge Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya September 2007

  2. Introduction ADWIN-DT Decision Tree Experiments Conclusions Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of { 1 , . . . , n } . Let π − 1 be π with one element missing. π − 1 [ i ] arrives in increasing order Task: Determine the missing number

  3. Introduction ADWIN-DT Decision Tree Experiments Conclusions Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Use a n -bit vector to Let π be a permutation of { 1 , . . . , n } . memorize all the Let π − 1 be π with one element numbers ( O ( n ) missing. space) π − 1 [ i ] arrives in increasing order Task: Determine the missing number

  4. Introduction ADWIN-DT Decision Tree Experiments Conclusions Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Data Streams: Let π be a permutation of { 1 , . . . , n } . O ( log ( n )) space. Let π − 1 be π with one element missing. π − 1 [ i ] arrives in increasing order Task: Determine the missing number

  5. Introduction ADWIN-DT Decision Tree Experiments Conclusions Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Data Streams: Puzzle: Finding Missing Numbers O ( log ( n )) space. Store Let π be a permutation of { 1 , . . . , n } . Let π − 1 be π with one element n ( n + 1 ) � − π − 1 [ j ] . missing. 2 j ≤ i π − 1 [ i ] arrives in increasing order Task: Determine the missing number

  6. Introduction ADWIN-DT Decision Tree Experiments Conclusions Data Streams Data Streams At any time t in the data stream, we would like the per-item processing time and storage to be simultaneously O ( log k ( N , t )) . Approximation algorithms Small error rate with high probability An algorithm ( ǫ, δ ) − approximates F if it outputs ˜ F for which Pr [ | ˜ F − F | > ǫ F ] < δ .

  7. Introduction ADWIN-DT Decision Tree Experiments Conclusions Data Streams Approximation Algorithms Frequency moments Frequency moments of a stream A = { a 1 , . . . , a N } : v � f k F k = i i = 1 where f i is the frequency of i in the sequence, and k ≥ 0 F 0 : number of distinct elements on the sequence F 1 : length of the sequence F 2 : self-join size, the repeat rate, or as Gini’s index of homogeneity Sketches can approximate F 0 , F 1 , F 2 in O ( log v + log N ) space. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximation the frequency moments. 1996

  8. Introduction ADWIN-DT Decision Tree Experiments Conclusions Classification Example Contains Domain Has Time Data set that “Money” type attach. received spam describes e-mail yes com yes night yes features for yes edu no night yes deciding if it is no com yes night yes spam. no edu no day no no com no day no yes cat no day yes Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?

  9. Introduction ADWIN-DT Decision Tree Experiments Conclusions Classification Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?

  10. Introduction ADWIN-DT Decision Tree Experiments Conclusions Decision Trees Basic induction strategy: A ← the “best” decision attribute for next node Assign A as decision attribute for node For each value of A , create new descendant of node Sort training examples to leaf nodes If training examples perfectly classified, Then STOP , Else iterate over new leaf nodes

  11. Introduction ADWIN-DT Decision Tree Experiments Conclusions VFDT / CVFDT Very Fast Decision Tree: VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate

  12. Introduction ADWIN-DT Decision Tree Experiments Conclusions VFDT / CVFDT Concept-adapting Very Fast Decision Trees: CVFDT G. Hulten, L. Spencer, and P . Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct “alternative branches” as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs

  13. Introduction ADWIN-DT Decision Tree Experiments Conclusions Decision Trees: CVFDT No theoretical guarantees on the error rate of CVFDT CVFDT parameters : W : is the example window size. 1 T 0 : number of examples used to check at each node if the 2 splitting attribute is still the best. T 1 : number of examples used to build the alternate tree. 3 T 2 : number of examples used to test the accuracy of the 4 alternate tree.

  14. Introduction ADWIN-DT Decision Tree Experiments Conclusions Decision Trees: ADWIN-DT ADWIN-DT improvements consist in : replace frequency statistics counters by estimators don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Summary: Theoretical guarantees 1 No Parameters 2

  15. Introduction ADWIN-DT Decision Tree Experiments Conclusions Time Change Detectors and Predictors: A General Framework Estimation ✲ x t ✲ Estimator

  16. Introduction ADWIN-DT Decision Tree Experiments Conclusions Time Change Detectors and Predictors: A General Framework Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect.

  17. Introduction ADWIN-DT Decision Tree Experiments Conclusions Time Change Detectors and Predictors: A General Framework Estimation ✲ x t ✲ Estimator Alarm ✲ ✲ Change Detect. ✻ ✻ ❄ ✲ Memory

  18. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1 01010110111111 1010101 1011 1111 [Dasu+ 06]

  19. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 10 1010110111111 1010101 1011 1111 [Dasu+ 06]

  20. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 101 010110111111 1010101 1011 1111 [Dasu+ 06]

  21. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010 10110111111 1010101 1011 1111 [Dasu+ 06]

  22. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 10101 0110111111 1010101 1011 1111 [Dasu+ 06]

  23. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 101010 110111111 1010101 1011 1111 [Dasu+ 06]

  24. Introduction ADWIN-DT Decision Tree Experiments Conclusions Window Management Models W = 101010110111111 Equal & fixed size subwindows Total window against subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 10111111 1010101 1011 1111 [Dasu+ 06]

Recommend


More recommend