Efficient Computation of Parsimonious Temporal Aggregation Giovanni Mahlknecht, Anton Dign¨ os, Johann Gamper Free University of Bozen-Bolzano, Italy ADBIS 2015 September 8-11, 2015 - Futuroscope, Poitiers, France ADBIS 2015 1/23 G. Mahlknecht et al.
Outline Introduction Diagonal Pruning Split Point Graph Experimental Evaluation ADBIS 2015 2/23 G. Mahlknecht et al.
Instant Temporal Aggregation (ITA) Patient treatment periods with daily cost Bob 600 P C T r 1 Bob 600 [1,4] Mary 400 Mary 400 [1,2] r 2 Eric 310 John 300 r 3 Eve 40 [3,3] Eve 40 Joe 30 Alex 100 r 4 Eric 310 [3,4] . Joe 30 [6,6] r 5 r 6 John 300 [6,9] 1 2 3 4 5 6 7 8 9 Alex 100 [9,9] r 7 days ADBIS 2015 3/23 G. Mahlknecht et al.
Instant Temporal Aggregation (ITA) Patient treatment periods with daily cost Bob 600 P C T r 1 Bob 600 [1,4] Mary 400 Mary 400 [1,2] r 2 Eric 310 John 300 r 3 Eve 40 [3,3] Eve 40 Joe 30 Alex 100 r 4 Eric 310 [3,4] . Joe 30 [6,6] r 5 r 6 John 300 [6,9] 1 2 3 4 5 6 7 8 9 Alex 100 [9,9] r 7 days ITA: at each timepoint SUM ( C ) Val T 950 330 400 s 1 1000 [1,2] 1000 910 300 950 [3,3] s 2 . s 3 910 [4,4] s 4 330 [6,6] 1 2 3 4 5 6 7 8 9 300 [7,8] days s 5 s 6 400 [9,9] ADBIS 2015 3/23 G. Mahlknecht et al.
Parsimonious Temporal Aggregation (PTA) Input : ITA result Output : merged tuples to size c with minimum error Rules: ◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c value = 983 . 33 , error = 1667 value = 333 . 33 , error = 6667 950 330 400 1000 910 300 . 1 2 3 4 5 6 7 8 9 days ADBIS 2015 4/23 G. Mahlknecht et al.
Parsimonious Temporal Aggregation (PTA) Input : ITA result Output : merged tuples to size c with minimum error Rules: ◮ Merge only adjacent tuples ◮ Merged values are weighted mean ◮ Error is Squared Sum Error ◮ Result is of size c value = 332 . 5 , error = 6675 950 330 400 1000 910 300 . 1 2 3 4 5 6 7 8 9 days ADBIS 2015 4/23 G. Mahlknecht et al.
PTA Optimal Solution Result of ITA for SUM ( C ) (size n = 6 ) error = 800 error = 600 950 330 400 1000 910 300 . 1 2 3 4 5 6 7 8 9 days Result PTA ( c = 4 ) total error 1,400 (optimal solution) 1 , 000 400 930 310 . 1 2 3 4 5 6 7 8 9 days ADBIS 2015 5/23 G. Mahlknecht et al.
Split Points / Split Path ◮ PTA computes a split path (sequence of split points) ◮ Tuples between split points are merged split 1 split 3 split 5 error = 800 error = 600 950 330 400 1000 910 300 . 1 2 3 4 5 6 7 8 9 days Split Path : [1, 3, 5] 1 , 000 400 930 310 . 1 2 3 4 5 6 7 8 9 days ADBIS 2015 6/23 G. Mahlknecht et al.
PTA Existing Algorithm Dynamic Programming Algorithm Error Matrix E Split Point Matrix J i=1 2 3 4 5 6 i=1 2 3 4 5 6 k=1 0 1667 5700 k=1 0 0 0 0 0 0 ∞ ∞ ∞ 2 - 0 800 5700 6300 12375 2 0 1 1 3 3 3 3 - - 0 800 1400 6300 3 0 0 2 3 3 5 4 - - - 0 600 1400 4 0 0 0 3 3 5 E i,k minimum error in reducing the first J i,k optimum split point i tuples to size k when reducing the first i tuples to size k only the last two rows are used whole matrix ADBIS 2015 7/23 G. Mahlknecht et al.
Problem and Contribution Problem ◮ Runtime and space requirements of existing algorithm not scalable Contribution ◮ Diagonal Pruning : Reduces the computational complexity by avoiding unnecessary computations ◮ Split Point Graph : Reduces the space complexity ◮ Result remains optimal ADBIS 2015 8/23 G. Mahlknecht et al.
Introduction Diagonal Pruning Split Point Graph Experimental Evaluation ADBIS 2015 9/23 G. Mahlknecht et al.
Diagonal Pruning Lemma (Diagonal Pruning) For the computation of the error matrix E and split point matrix J there exists an upper bound for variable i . i=1 2 3 4 5 6 k=1 0 0 0 0 0 0 2 - 1 1 3 3 3 3 - - 2 3 3 5 4 - - - 3 3 5 ◮ Red cells can be avoided, reduces runtime ◮ allows to eliminate parts of the matrices, reduces memory ADBIS 2015 10/23 G. Mahlknecht et al.
Introduction Diagonal Pruning Split Point Graph Experimental Evaluation ADBIS 2015 11/23 G. Mahlknecht et al.
Split Point Graph Challenge Substitution of Split Point Matrix J by alternative structure to reduce memory consumption Idea unnecessary nodes are not stored Split Point Graph ◮ Only necessary nodes are inserted ◮ Nodes are removed when they become obsolete (Path Pruning) ADBIS 2015 12/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 active path pruned nodes 1 path pruned nodes 1 ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 diagonal pruned node 2 Total number of nodes: 24 active path pruned nodes 1 Not computed nodes: 12 path pruned nodes 1 Path pruned nodes: 4 ADBIS 2015 13/23 G. Mahlknecht et al.
Introduction Diagonal Pruning Split Point Graph Experimental Evaluation ADBIS 2015 14/23 G. Mahlknecht et al.
Experimental Configuration Synthetic Datasets ◮ SYNTH: random distributed values ◮ ETDS: evolution of employees in a company Algorithm Comparisons ◮ PTA: original Algorithm ◮ DP: PTA with diagonal pruning ◮ SGP: Split point graph with diagonal and path pruning ADBIS 2015 15/23 G. Mahlknecht et al.
Runtime: PTA vs Diagonal Pruning ETDS SYNTH 150 Runtime [sec] Runtime [sec] 10 100 PTA PTA 5 DP DP 50 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Reduction size [k] Reduction size [k] Diagonal pruning substantially reduces runtime ADBIS 2015 16/23 G. Mahlknecht et al.
Runtime: Split Point Graph vs PTA with Diagonal Pruning ETDS SYNTH 150 Runtime [sec] Runtime [sec] SPG SPG 6 DP DP 100 4 50 2 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Reduction size [k] Reduction size [k] The overhead of the dynamic graph structure and path pruning is very small ADBIS 2015 17/23 G. Mahlknecht et al.
Space Efficiency: PTA vs SPG (compression to 10%) ETDS SYNTH Memory [MB] 80 Memory [MB] 80 PTA, c=10% PTA, c=10% SPG, c=10% SPG, c=10% 60 60 40 40 20 20 0 0 0 1 2 3 4 5 6 7 8 910 0 1 2 3 4 5 6 7 8 910 Input cardinality n [k] Input cardinality n [k] Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption ADBIS 2015 18/23 G. Mahlknecht et al.
Space Efficiency: PTA vs SPG (compression to 1%) ETDS SYNTH Memory [MB] 8 Memory [MB] 8 PTA, c=1% PTA, c=1% SPG, c=1% SPG, c=1% 6 6 4 4 2 2 0 0 0 1 2 3 4 5 6 7 8 910 0 1 2 3 4 5 6 7 8 910 Input cardinality n [k] Input cardinality n [k] Graph Implementation with Diagonal Pruning and Path Pruning substantially reduces space consumption ADBIS 2015 19/23 G. Mahlknecht et al.
Space Efficiency: Effect of Path Pruning ETDS SYNTH Memory [MB] 200 Memory [MB] 200 150 150 100 100 50 50 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Reduction size c [ 10 3 ] Reduction size c [ 10 3 ] SPG without Path Pruning SPG without Path Pruning PTA PTA SPG SPG Path Pruning has a huge pruning effect. It prunes about 2/3 of the graph ADBIS 2015 20/23 G. Mahlknecht et al.
Related Work ◮ Tuma, P.: Implementing Historical Aggregates in TempIS. Ph.D. thesis, Wayne State University, Detroit, Michigan (1992) ◮ Kline, N., Snodgrass, R.T.: Computing temporal aggregates. In: ICDE. pp. 222-231 (1995) ◮ Moon, B., Vega Lopez, I.F., Immanuel, V.: Efficient algorithms for large-scale temporal aggregation. IEEE Trans. Knowl. Data Eng. 15(3), 744-759 (2003) ◮ Tao, Y., Papadias, D., Faloutsos, C.: Approximate temporal aggregation. In: ICDE. pp. 190-201 (2004) ◮ Gordeviˇ cius, J., Gamper, J., B¨ ohlen, M.H.: Parsimonious temporal aggregation. VLDB J. 21(3), 309-332 (2012) ADBIS 2015 21/23 G. Mahlknecht et al.
Conclusion ◮ Diagonal Pruning reduces the runtime of the computation by reducing the search space of the DP scheme adopted by PTA ◮ Split Point Graph in combination with Path Pruning reduces memory consumption ◮ Experiments showed that the two optimizations reduce memory requirements to one third of the original PTA implementation ADBIS 2015 22/23 G. Mahlknecht et al.
Recommend
More recommend