Minimisation de la m´ emoire VS minimisation du volume d’E/S dans les m´ ethodes de factorisation de matrices creuses Abdou Guermouche, LaBRI Bordeaux May 2010
Context Solving sparse linear Typical matrix: BRGM systems matrix • 3 . 7 × 10 6 variables • 156 × 10 6 non zeros in A • 4 . 5 × 10 9 non zeros in LU • 26 . 5 × 10 12 flops Ax = b ⇒ Direct methods: A = LU Abdou Guermouche, May 2010 2/43
Context Solving sparse linear Typical matrix: BRGM systems matrix • 3 . 7 × 10 6 variables • 156 × 10 6 non zeros in A • 4 . 5 × 10 9 non zeros in LU • 26 . 5 × 10 12 flops Ax = b ⇒ Direct methods: A = LU Abdou Guermouche, May 2010 2/43
Context Physical constraint Software challenge Core memory • Implementation of an out-of-core execution Memory required scheme within MUMPS Memory crash Abdou Guermouche, May 2010 2/43
Context Out-of-core Software challenge Core memory Disks • Implementation of an out-of-core execution Memory required scheme within MUMPS Use of disks Abdou Guermouche, May 2010 2/43
Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 3/43
Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 4/43
The multifrontal method (Duff, Reid’83) 1 2 3 4 5 1 2 3 4 5 0 0 0 0 1 1 0 0 0 0 2 0 2 0 A= L+U−I= 0 0 0 3 3 0 5 5 4 0 0 4 0 0 0 0 0 0 5 5 Non−zero Fill−in 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
The multifrontal method (Duff, Reid’83) 5 5 4 4 5 Storage divided into two parts: 3 Factors • Factors systematically written to 3 4 1 disk; 1 4 2 • Active Storage kept in memory. 2 5 3 Contribution block Active Stack of Factors frontal contribution Elimination tree matrix blocks Active Storage Abdou Guermouche, May 2010 5/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Memory behaviour (serial postorder traversal) 3 1 2 Abdou Guermouche, May 2010 6/43
Sequential case results Memory peak Memory peak Worst case. Best case. → Algorithms to find the optimal tree traversal have been proposed Abdou Guermouche, May 2010 7/43
Sequential case results Memory peak Memory peak Worst case. Best case. → Algorithms to find the optimal tree traversal have been proposed Abdou Guermouche, May 2010 7/43
Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). The assembly step requires a storage: n � m + cb j j = 1 Abdou Guermouche, May 2010 8/43
Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). The storage required to process child j is: j − 1 � A j + cb k k = 1 Abdou Guermouche, May 2010 8/43
Sequential case: Memory behavior (2/2) Consider a parent node in the tree: • n is the number of children. • j denotes the j th child of the node. • cb j is the size of the contribution block of cb n cb 1 child j . cb 2 • m is the memory size of the frontal matrix of ... n 1 2 the parent. • A (resp. A j ) is the amount of active memory needed to process the parent (resp. child j ). A is thus defined by: j − 1 n � � A = max ( max j = 1 , n ( A j + cb k ) , m + cb j ) k = 1 j = 1 Abdou Guermouche, May 2010 8/43
Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 9/43
Liu’s Algorithm Liu’s Theorem (Tree pebbling theorem) The minimum of max j ( x j + � j − 1 i = 1 y j ) is obtained when the sequence ( x i , y i ) is sorted in decreasing order of x i − y i , Consequence: An optimal child sequence is obtained by rearranging the children nodes in decreasing order of A i − cb i . Algorithm: • Bottom-up greedy process. • Apply Liu’s theorem at each level of the tree. Abdou Guermouche, May 2010 10/43
Outline Multifrontal method Active memory minimization Algorithm (Liu’s Algorithm) Memory issues Limitation of the approach New multifrontal schedules and algorithms Flexible allocation scheme A new memory minimization algorithm Results Total memory minimization How about Volume of I/O? Computing Volume of I/O Minimizing I/O volume Towards an out-of-core flexible allocation Conclusion and Future work Abdou Guermouche, May 2010 11/43
Limitation of the Classical scheme Allocation of the father Memory peak Memory peak Allocation of the father Classical approach. Flexible scheme. → Decoupling the allocation and the computations can improve the memory behavior Abdou Guermouche, May 2010 12/43
Limitation of the Classical scheme Allocation of the father Memory peak Memory peak Allocation of the father Classical approach. Flexible scheme. → Decoupling the allocation and the computations can improve the memory behavior Abdou Guermouche, May 2010 12/43
Recommend
More recommend