minimisation de la m emoire vs minimisation du volume d e
play

Minimisation de la m emoire VS minimisation du volume dE/S dans - PowerPoint PPT Presentation

Minimisation de la m emoire VS minimisation du volume dE/S dans les m ethodes de factorisation de matrices creuses Abdou Guermouche, LaBRI Bordeaux May 2010 Context Solving sparse linear Typical matrix: BRGM systems matrix 3 .


  1. Minimisation de la m´ emoire VS minimisation du volume d’E/S dans les m´ ethodes de factorisation de matrices creuses Abdou Guermouche, LaBRI Bordeaux May 2010

  2. Context Solving sparse linear Typical matrix: BRGM systems matrix • 3 . 7 × 10 6 variables • 156 × 10 6 non zeros in A • 4 . 5 × 10 9 non zeros in LU • 26 . 5 × 10 12 flops Ax = b ⇒ Direct methods: A = LU Abdou Guermouche, May 2010 2/43

  3. Context Solving sparse linear Typical matrix: BRGM systems matrix • 3 . 7 × 10 6 variables • 156 × 10 6 non zeros in A • 4 . 5 × 10 9 non zeros in LU • 26 . 5 × 10 12 flops Ax = b ⇒ Direct methods: A = LU Abdou Guermouche, May 2010 2/43

  4. Context Physical constraint Software challenge Core memory • Implementation of an out-of-core execution Memory required scheme within MUMPS Memory crash Abdou Guermouche, May 2010 2/43

  5. Context Out-of-core Software challenge Core memory Disks • Implementation of an out-of-core execution Memory required scheme within MUMPS Use of disks Abdou Guermouche, May 2010 2/43

  6. Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 3/43

  7. Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 4/43

  8. The multifrontal method (Duff, Reid’83) 1 2 3 4 5 1 2 3 4 5 0 0 0 0 1 1 0 0 0 0 2 0 2 0 A= L+U−I= 0 0 0 3 3 0 5 5 4 0 0 4 0 0 0 0 0 0 5 5 Non−zero Fill−in 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  9. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  10. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  11. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  12. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  13. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  14. The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43

  15. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  16. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  17. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  18. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  19. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  20. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  21. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  22. Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43

  23. Sequential case results Memory peak Memory peak Worst case. Best case. → Algorithms to find the optimal tree traversal have been proposed Abdou Guermouche, May 2010 7/43

  24. Sequential case results Memory peak Memory peak Worst case. Best case. → Algorithms to find the optimal tree traversal have been proposed Abdou Guermouche, May 2010 7/43

  25. Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). The assembly step requires a storage: n � m + cb j j = 1 Abdou Guermouche, May 2010 8/43

  26. Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). The storage required to process child j is: j − 1 � A j + cb k k = 1 Abdou Guermouche, May 2010 8/43

  27. Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). A is thus defined by: j − 1 n � � A = max ( max j = 1 , n ( A j + cb k ) , m + cb j ) k = 1 j = 1 Abdou Guermouche, May 2010 8/43

  28. Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 9/43

  29. Liu’s Algorithm Liu’s Theorem (Tree pebbling theorem) The minimum of max j ( x j + � j − 1 i = 1 y j ) is obtained when the sequence ( x i , y i ) is sorted in decreasing order of x i − y i , Consequence: An optimal child sequence is obtained by rearranging the children nodes in decreasing order of A i − cb i . Algorithm: • Bottom-up greedy process. • Apply Liu’s theorem at each level of the tree. Abdou Guermouche, May 2010 10/43

  30. Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 11/43

  31. Limitation of the Classical scheme Allocation of the father Memory peak Memory peak Allocation of the father Classical approach. Flexible scheme. → Decoupling the allocation and the computations can improve the memory behavior Abdou Guermouche, May 2010 12/43

  32. Limitation of the Classical scheme Allocation of the father Memory peak Memory peak Allocation of the father Classical approach. Flexible scheme. → Decoupling the allocation and the computations can improve the memory behavior Abdou Guermouche, May 2010 12/43

Recommend


More recommend