measuring the on lineness of data streams
play

Measuring the on-lineness of data streams Manfred K. Warmuth - PowerPoint PPT Presentation

Measuring the on-lineness of data streams Manfred K. Warmuth Jiazhong Nie University of California - Santa Cruz Dec. 10, 2015 - Nips workshop on Easy Data Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt


  1. Measuring the “on-lineness” of data streams Manfred K. Warmuth Jiazhong Nie University of California - Santa Cruz Dec. 10, 2015 —- Nips workshop on Easy Data Includes some earlier work with Corrie Scalisi, Robert Gramacy, Scott Brandt and Ismail Ari Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 1 / 25

  2. Goals Design on-line algorithms in domains that are outside of the reach of theory Design good comparators that exploit the on-lineness of the data Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 2 / 25

  3. 1. Disk spindown problem [HLSS] When to spin down the disk on your laptop? Best time-out time/user/usage dependent Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 3 / 25

  4. Non-convex loss If idles times expected to be short, then long timeout better long, then short timeout better Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 4 / 25

  5. 2. Caching [BWBA] Want to build combined caching policy from 12 base policies (our experts): LRU, RAND, FIFO, LIFO, LFU, MFU, SIZE, GDS, GD ∗ , GDSF, LFUDA Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 5 / 25

  6. Characteristics Vary with Time Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 6 / 25

  7. Best Policy Varies with time Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 7 / 25

  8. Permuting trick for disk spindown data on-line :-) not on-line :-( Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 8 / 25

  9. Permuting caching data highly on-line data some caching policies already on-line Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 9 / 25

  10. Using a comparators to measure on-lineness of data Properties Should exploit on-lineness of data Might be too expensive to compute in practice, but can serve as a goal to compare against Might rely on information not available to the on-line algorithm Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 10 / 25

  11. Idea 1: Use dynamic programming to compute BestShift( K ) curve Partition of the timeline into K segments BestFixed in each segment 2 4 7 Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 11 / 25

  12. BestFixed( K ) Dynamic programming: O ( KN 2 T ) [H] where K # of partitions, N # of discrete idle times, T # of trials Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 12 / 25

  13. BestShift curves on-line not on-line Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 13 / 25

  14. Comparators for caching BestFixed : a posteriori best of 12 policies on entire request stream BestRefetching ( R ): minimum number of misses with at most R refetches in any sequence of switching policies Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 14 / 25

  15. Refetches & Policy Switches Comparator: All sequences of the form We plot miss rate v.s. refetches: Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 15 / 25

  16. BestRefetching( R ) Dynamic programming: O ( RN 2 T ) [H] Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 16 / 25

  17. Our theoretically sound algorithms become heuristics Use loss and share updates on non-convex losses Build a merged cache that does not correspond to the mixture Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 17 / 25

  18. Spindown results on-line :-) not on-line :-( Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 18 / 25

  19. Caching - we “Tracks” best policy Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 19 / 25

  20. WWk Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 20 / 25

  21. UMo Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 21 / 25

  22. SMoLRU Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 22 / 25

  23. Idea 2: Split into even/odd requests Pair1 Pair2 Pair3 Pair4 Pair5 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Requests: Training Testing Best partition based on training set Performance based on test set Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 23 / 25

  24. Miss Rate of Testing Requests No overfitting to random data: testing miss rate goes up immediately 0.055 random permuted data train random permuted data test original data train original data test 0.05 miss rate 0.045 0.04 0.035 0 0.02 0.04 0.06 0.08 refetch rate Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 24 / 25

  25. Upshot! Don’t be afraid to use your algorithms as heuristics in domains where the theory breaks down Manfred K. Warmuth , Jiazhong Nie ( University of California - Santa Cruz ) Measuring the “on-lineness” of data streams 25 / 25

Recommend


More recommend