hpcgarage.org/uiuc20 Slides are here right now— Piyush Sao get ‘em while they’re hot 🔦 (ORNL) Communication-avoiding sparse direct solvers for linear systems & Sherry Li graph problems (LBNL) Ramki Kannan Richard (Rich) Vuduc ORNL February 28, 2020 CSL Student Conference @ UIUC
“The problems are solved, not by giving new information, but by arranging what we have known since long.” — Ludwig Wittgenstein
hpcgarage.org/uiuc20 https://twitter.com/aydoz/status/1130559070627348480 3
<latexit sha1_base64="2rFIgRkcbTkTUbYHfJWXWkZ9ys=">AF63icdVTbjts2EFUubhL3k362BemRoBdQLEl+b5AgKROu1sgqV03myxgGQuaHtnCUpRAUut1BH1F34qib0X7G/2M/k2Hso2uWkeABIpzpkLhzNLeKi04/x96/adu5VP7t1/UP30s8+/eHjw6PE7FaeSwRmLeSzPZ1QBDwWc6VBzOE8k0GjG4f3scmDs769AqjAWb/U6gWlEFyIMQkY1bl0cPaF+HD48og8J/4sXAyzH/KLg5pTb/ZaTshTr3lOd1e1yz6bqvfI27dKZ6atX1GF4/u/uXPY5ZGIDTjVKmJ6yR6mlGpQ8Yhr/qpgoSyS7qAjEZKraNZTp5GVC/Vf21mc59tkuqgN81CkaQaBEMI2oKUEx0TkxeZhxKY5mtcUCZD9EzYkrKNGZfrT4lb+glkFPgV4A2SvQSORDQlOuSH8Uoh/k0WxZIjF2CgBWLo4iKuR/QKOTrLS/zVbBd5qQk8tadZkEsikiN728FxQMhEjiW/QqICj+EYlHKHW24CwX8ZapjrACmEAoF0pwUiQMC1pSopAB5bpdF3sF9RUE2AnkdDQgJ5j9AkjRI0SxJURAnj0jkxEmqDWQBhkyoIJ8ZyI6PiaD4evh2ZjHpl2mh0utE3XcaKxWq7qRSOXGgj9RI9lINxWt9koVC6MylE5qsIzVnBexFT8ZRjYyUzGK5Fn45Nv8sx1PLvbtv9fTiewhbWadpu2+73e4enIT5Tq3v2a2e3WntQcWSisVOz/NQ0GnbHW8Pcg2cx6sdsunantOxO809yIWE9c51p4Mh9sy7Bzin8vIGuN9G713b9Uw6/7ZX5i8TtihYphUm3jTL/I1Izc1rXu7vTrpEYlwWpZpg15UblQR+f9nmcLtJxUlLTjYUyMFKd7veA7Yj2TwevxTE1tYrzmUzhsFlQE1N504QEfYt+YmE3PjY67KAZhxs3EPQqUSDLzRevYDbM/Sc+h0Af1lxfhoulPvpI+mOgXOVlFRmR7/0n4xuMV4CjScIbNA8TkFRjUc3syzPzqeLQ2028vHFO6/uNuvej63ai/Pt+LtvfWV9bR1artW1Xlin1sg6s5h1bf1m/WH9WYkqP1d+qfy6gd6+teV8aZWeyu/AEBhAPE=</latexit> hpcgarage.org/pp20 Ax = b nnz( A ) = O ( N ) 4
hpcgarage.org/pp20 unit lower triangular L = PA = LU upper triangular U = permutation (pivoting) P = 5
Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 “2D” algorithm (strong scaling) P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 6
<latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="FvmfGaFZLK4N8dl/k8PLliSOBOU=">AK4HiclVbdb+NEPcdFzjM1xVekHjxERW1KASv7TjxA1LFcRLSdC1vR5xrlo7k8Q6f4T15trc4nfeEI/wn/HfMGvny4lTFUutRrszv5nf7O5v4k3DIOW6/u+Dh+8qr373uP31Q8+/OjT54cfHqZJjPmw4WfhAm78mgKYRDBQ94CFdTBjTyQnjhvX4m91+8AZYGSXzO51MYRHQcB6PApxyXrg8e/X2oXaSgjZKYa2nwFjTb0qZcVd0Ybvwkimg8FO4QRpeUZX0yEC7E6YxBRPlE1Enm3qZT6kNW9u+xBAuIniNetowWcbldUY5C263nF4Z24I5qdrD+HmrEUyCpAYDLNeth3xA+ZOe8BkZFaq2B0x6m9lLvDXTHaofx+Mf0543rCsb2y3wH3qhjDiR6JuZC4LxhN+vKcriHM+AU6LPpaBhVtsyZ7uhnUjGO8JK7b2hE2qY2TlPg1FN1sEliO7MZxPAjasah3JhLm3V2cQnd8kp1tx8n86EriZsHn2ShwtV4zT7Djb0ysJNWEA9wYz7wB7hsZ5EMH+4pYe9y1vhXhHjVWYW1VWXPSrMtjObdcKt2vhPr3N9pWXu7y8P9L8bqRf74/0NivR82AcxAJ+iyljdP41ghckNZdjZ9JF4EvtO82xVcwx3PC9flLXm6ZOHJtoaFiEtG0LMNsGY5Gmnr+1U8+V/Kvd31Qa7rDxJ9FEHM/pGnaJ/qUDwRlPBDyWyWAlb2mo6hj2ZMsYSByMlk2iGuDFEHWaGF+epmhKBRms4jDz3zw9zek4tVe/0ZH3UGIoinMw6xXyQazUKNJ5pUZG0YMPB5OEeD+izAWjV/QvGxcdRtZTGD1nqJ0Mw8+ZubDAIpXiXKIrVgW6sTSB8A1xGM9g85xGNgnCOQkFnIcfjTkdLW1UPNdTLSfleKguO/NgQ1jcSZ5cfNM0IMqWCkl2FVJ1TwEPi4HM0Z0CoxyvVj5ButNM9FjFaNlNXLhXJO2vkw72veUCUgq7tEQe0fjdbUhFl6paWSCOIlmfHItluKpBWXjfu7wyoET4X9N3EfB85p15yW3BbLGWk8N3GuGwDcYz7WcZH7K+ZVP8yfQ9YHG2o/4vrUpDYFz0I7k9oyFifw50cRkx6ocLPi7Y0MNvHCGI5WNvUzoTcOydcdstxr4am2DrG2HWLptPAp2GaDtJwGaberwFhyE6/QTKdjLQAMx3DszsI2dUvaCKYR3WjgasupAJM6tYQqIskSoYh1jIbVadhWRWzCaDxe8+q01wsopMWcsmBLIMsKjEMJKa3GrZRATeHMExuVnCObphLZp2CTIFWdC+HM0nD0O2GbVbAjRnMV2C2teqSTdpLigsz52nb2PKO/FNRYZcyqu03Lo0mwW79YtRPeoXUKo+VL5QvlSOFKG3lRPlJ6SkXil9Tal/Vvq3pqf+of6p/lW4PnywiPlMKX3qP/8BeoDc7w=</latexit> <latexit sha1_base64="gQJtegFK7Xec6ZaECn/FHx6O7z8=">AK4HiclVZLb9tGEGbSqE3ZV9we9nUcGEXqsqXKPFQIGjcohdXKmI7D1ExltRQIsKHulzFVra891b02P6z/pvOknpRogyXgI3B7sw383ufiNvGoUZ17R/791/70Hj/Q8efqh+9PEn3726ODzydMR8u/DRK2QuPZhCFCVzwkEfwYsqAxl4Ez703T+X+87fAsjBNzvl8CsOYjpMwCH3Kcenq4MHfR+QiAxKkCSdZ+A6IbZEpV1U3gWs/jWOajIQ7guCSsnygD4ULSTZjEFM+EYd67t5kU+pDXvXvsxQLiJ8hXr6MFkm+5XVGOQtvtpxeG9tuCOZnaw/hFqxFGoRIDEZ5P9+O+BFzZ31gMjKvVOwGjPpbmUv8NZMd6j+E419SXjQsHxjbLXAfuxE/FgcGrnLwvGEn+zpCuKcT4DTso9VYOGW7Knu2G9GMZ7wsqtPWGT+hZuU8j0csXgdXIXgLnk5CN6lqn58Lc26sziM+v09OtOPk/CwRupmyevxbHyxXjND/J9/RKQk0YwJ3BzFvAnqJxHsawv7ilx13LWyHeUmMd5laVNRf9RVs57aT0u1KuI9v8n3lFS4v740vx3p1d2R3uUVeh6Mw0TAbwljM6/QfCSJHE5diZbBL4k3xPHVjHaMP36tGh1jI13bF1goal6x0bDcsw24ZD9JZWfIfK4utfHTRa7ij1ZzEk3I9olg10bcqHgjIe+pFkNsAK3tDxzBAM6FYwlAUZHJyhCsj1EFWamGxuhkhaJxl89hDz+Iwt/fkYt3eYMaD7lCEyXTGIfHLRMEsIjwlUpHJKGTg82iOBvVZiLUSf0LxsXHUbWSxo9Y5qcjMIvmbmwiKR4VyiK1YFurE0gegtcRjPYPOeAxmE0R6Ggs4jcWfB0lbVI4J6OaneCw/VZWcebAiLOymSi29bBsT5UiH1XYVU3VPAw2Igc/SmwCjHq1VMkN40F31WM1p2E5fuNUkH6TDfW+5hJTCLi1RDR/d5tS0aWq1haIo0jWJ8diFa5uUJbedy6vCigR/tf0XQ8m3lnXnpTMlusFeTwnca4IfANJnNSkCxOubjyWfEej7QhPyE75tMaQScAzmW2zMWpfLnRAuTnahysODvjg018KIZjlQ29nKhtQzL1hyz027iq7UNfW07uqXZRhufgm029bT1DudOjCWXicrNPpWgsAwzEcu7uwTc2SNoIRXTOauNp2asCkTi2hykh9iVDGOkbT6jZtqyY2ZTQZr3l1O2sulq7pbeRSAFmGvqjEMJCY1m7aRg3cHKIovV7BOZphLpl1SzIlWtm9As7Um4ZmN2zBm7MYL4Cs61Vl2y9s6S4MAueto0t78o/FRV2KaNkv3FptHTs1q/G4ZP+QmsfKl8qXynHiq50lCfKz0pfuVD8htL4uvFdQ1M9Q/1T/Wv0vX+vUXMF0rlU/5D4Li3Jc=</latexit> Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 Example: 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 P x × P y = 96 (Best configuration shown) 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 7
Baseline: SuperLU_DIST hpcgarage.org/pp20 2.0 X. Li & J. Demmel (ACM TOMS ’03) 1.7 2 2 1.5 32 1.6 1.1 1.3 1.5 1.7 1.3 0.81 16 1.2 0.67 0.93 0.91 1.1 0.97 0.94 8 “2D” algorithm (strong scaling) P z 0.28 0.42 0.49 0.48 0.49 0.53 4 0.8 0.093 0.12 0.17 0.21 0.23 0.24 2 0.4 Teraflop/s 0.036 0.044 0.055 0.065 0.064 0.073 1 (32x procs → 2x speedup) 24 48 96 192 384 768 2D-Grid MPI processes (2D process grid; 4 cores / process) 8
Communication-avoiding idea hpcgarage.org/pp20 For matrix multiplication, C += A ⋅ B , on P processors B A C 9
Communication-avoiding idea hpcgarage.org/pp20 For matrix multiplication, C += A ⋅ B , on P processors B A C 10
Recommend
More recommend