parallelization of the pc algorithm
play

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank - PowerPoint PPT Presentation

Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank Jensen 1 Antonio Salmern 3 Helge Langseth 4 Thomas D. Nielsen 2 1 Hugin Expert A/S, Aalborg, Denmark 2 Dept. Computer Science, Aalborg University, Denmark 3 Dept. Mathematics,


  1. Parallelization of the PC Algorithm Anders L. Madsen 1 , 2 Frank Jensen 1 Antonio Salmerón 3 Helge Langseth 4 Thomas D. Nielsen 2 1 Hugin Expert A/S, Aalborg, Denmark 2 Dept. Computer Science, Aalborg University, Denmark 3 Dept. Mathematics, University of Almería, Spain 4 Dept. Computer and Information Science. Norwegian University of Science and Technology, Trondheim, Norway CAEPIA 2015, Albacete, November 7, 2015 1

  2. Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu CAEPIA 2015, Albacete, November 7, 2015 2

  3. Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu ◮ Large number of variables ◮ Massive datasets ◮ Hybrid Bayesian networks (involving discrete and continuous variables) ◮ Conditional linear Gaussian networks CAEPIA 2015, Albacete, November 7, 2015 3

  4. Introduction ◮ The AMiDST project: Analysis of MassIve Data STreams http://www.amidst.eu ◮ Large number of variables ◮ Massive datasets ◮ Hybrid Bayesian networks (involving discrete and continuous variables) ◮ Conditional linear Gaussian networks Objectives ◮ Scale up the PC algorithm for learning CLG networks from large volumes of data. ◮ Take advantage of parallel computing environments with shared memory. CAEPIA 2015, Albacete, November 7, 2015 4

  5. The PC algorithm 1. Determine pairwise (conditional) independence I ( X , Y ; S ) . 2. Identify skeleton of G . 3. Identify v -structures in G . 4. Identify derived directions in G . 5. Complete orientation of G making it a DAG. CAEPIA 2015, Albacete, November 7, 2015 5

  6. The PC algorithm 1. Determine pairwise (conditional) independence I ( X , Y ; S ) . 2. Identify skeleton of G . 3. Identify v -structures in G . 4. Identify derived directions in G . 5. Complete orientation of G making it a DAG. Remarks ◮ Step 1 takes most of the computing time ◮ Marginal independence ( S = ∅ ) is tested first ◮ Only potential neighbours are included in the conditioning set CAEPIA 2015, Albacete, November 7, 2015 6

  7. Our proposal for parallelisation We propose to parallelise Step 1 (pairwise c.i. tests) 1. Test all pairs X and Y for marginal independence. ◮ Use BIB designs 2. Perform the most promising higher-order c.i. tests. ◮ We create an edge index array, which the threads iterate over to select the next edge to evaluate for each iteration. ◮ The edge index array contains all edges that has not been removed at an earlier step and it is sorted in decreasing order of the test score ◮ Tests of size |S| = 1 , 2 , 3 may be performed. 3. Remaining tests of conditional independence ( X , Y ; S ) where |S| = 1 , 2 , 3. CAEPIA 2015, Albacete, November 7, 2015 7

  8. Balanced Incomplete Block (BIB) designs ◮ It is a concept coming from statistical design of experiments that provides a way of arranging experimental units when testing the effectiveness of a treatment A design is a pair ( X , A ) s. t. the following properties are satisfied: 1. X is a set of elements called points, and 2. A is a collection of nonempty subsets of X called blocks. Let v , k and λ be positive integers s. t. v > k ≥ 2. A ( v , k , λ ) -BIB design is a design ( X , A ) s. t. the following properties are satisfied: 1. | X | = v, 2. each block contains exactly k points, and 3. every pair of distinct points is contained in exactly λ blocks. CAEPIA 2015, Albacete, November 7, 2015 8

  9. BIB Design Example Consider the ( 7 , 3 , 1 ) -BIB design for 14 variables ◮ Each point represents two variables ◮ Each process is assigned six variables The seven blocks ( b = 7) are: { 013 } , { 124 } , { 235 } , { 346 } , { 450 } , { 561 } , { 602 } The pairwise scoring is performed as CAEPIA 2015, Albacete, November 7, 2015 9

  10. Balanced Incomplete Block (BIB) designs ◮ The testing is divided into tasks of equal size such that we test exactly all pairs X , Y for marginal independence ◮ This is achieved using BIB designs on the form ( q , 6 , 1 ) and then ( 3 , 2 , 1 ) where q is at least the number of variables X 1 X 2 · · · X 7 · · · X 19 · · · X 23 · · · X 30 X n · · · X 1 X 2 X 7 X 19 X 23 X 30 · · · X 1 X 2 X 7 X 19 X 7 X 19 X 23 X 30 X 1 X 2 X 23 X 30 · · · X 1 X 2 X 1 X 7 X 1 X 19 · · · CAEPIA 2015, Albacete, November 7, 2015 10

  11. Extra heuristics ◮ For each edge, we compute the set of most promising tests ◮ For each edge ( X , Y ) the set of best candidate variables to include in S are identified using the weight of a candidate variable Z which is equal to the sum of the test scores for ( X , Z ) and ( Y , Z ) : w ( Z | ( X , Y )) = 2 N ( MI ( Z , X ) + MI ( Z , Y )) where MI ( · , · ) is the mutual information. ◮ We create an array of best candidates with ≤ 7 vars (counts stored in memory) sorted by the sum of the edge weights ◮ The threads iterate over the edge index array. A thread performs all tests for a selected edge (with |S| = 1 , 2 , 3) from the best candidate array. Testing stops as soon as an independence hypothesis is not rejected CAEPIA 2015, Albacete, November 7, 2015 11

  12. Empirical evaluation data set |X| Total CPT size ship-ship 50 130,478 Munin1 189 19,466 Diabetes 413 461,069 Munin2 1,003 83,920 sacso 2,371 44,274 ◮ Software implementation based on HUGIN software ◮ Three data sets generated at random for each network with 100,000, 250,000, and 500,000 cases ◮ The empirical evaluation is performed on a Linux computer running Red Hat Enterprise Linux 7 with a six-core Intel (TM) i7-5820K 3.3GHz processor and 64 GB RAM ◮ The computer has 6 physical cores and 12 logical cores CAEPIA 2015, Albacete, November 7, 2015 12

  13. Empirical evaluation 2.5 2.5 40 2.5 Time Time 35 Speed-up Speed-up Average run time in seconds Average run time in seconds 2 2 2 Average speed-up factor Average speed-up factor 30 25 1.5 1.5 1.5 20 1 1 1 15 10 0.5 0.5 0.5 5 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (a) ship-ship 500,000 (b) Munin1 250,000 CAEPIA 2015, Albacete, November 7, 2015 13

  14. Empirical evaluation 2500 7 3500 7 Time Time Speed-up Speed-up 6 3000 6 Average run time in seconds Average run time in seconds 2000 Average speed-up factor Average speed-up factor 5 2500 5 1500 4 2000 4 3 1500 3 1000 2 1000 2 500 1 500 1 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (c) Diabetes 250,000 (d) Diabetes 500,000 CAEPIA 2015, Albacete, November 7, 2015 14

  15. Empirical evaluation 140 4.5 300 4 Time Time 4 3.5 Speed-up Speed-up 120 Average run time in seconds Average run time in seconds 250 3.5 Average speed-up factor Average speed-up factor 3 100 3 200 2.5 80 2.5 150 2 2 60 1.5 1.5 100 40 1 1 50 20 0.5 0.5 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (e) Munin2 250,000 (f) Munin2 500,000 CAEPIA 2015, Albacete, November 7, 2015 15

  16. Empirical evaluation 400 6 800 7 Time Time 350 700 Speed-up Speed-up 6 Average run time in seconds 5 Average run time in seconds Average speed-up factor Average speed-up factor 300 600 5 4 250 500 4 200 3 400 3 150 300 2 2 100 200 1 1 50 100 0 0 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Number of threads Number of threads (g) sacso 250,000 (h) sacso 500,000 CAEPIA 2015, Albacete, November 7, 2015 16

  17. Empirical evaluation Data set Skeleton v -structures Orientation (Step 2) (Step 3) (Steps 4 and 5) ship-ship 0 0 0 Munin1 0.005 0 0.001 Diabetes 0.001 0.004 0.002 Munin2 0.006 0.002 0.034 sacso 0.051 5.692 0.502 CAEPIA 2015, Albacete, November 7, 2015 17

  18. Conclusions ◮ Parallelisation of structure learning using the PC algorithm ◮ The edge index array is the central bottleneck of the approach as it is the only element that requires synchronization ◮ The number of threads used by the algorithm may impact the result as the order of tests is not invariant under the number of threads used. This is a topic of future research. ◮ The results of the empirical evaluation show a significant time performance improvement over the pure sequential method. CAEPIA 2015, Albacete, November 7, 2015 18

  19. This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 CAEPIA 2015, Albacete, November 7, 2015 19

Recommend


More recommend