fast multipole methods in arbitrary dimensions
play

Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu - PowerPoint PPT Presentation

Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez Bill March Bo Xiao GEORGE BIROS padas.ices.utexas.edu Problem statement and contributions FMM for kernel matrices given points in high D


  1. Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez Bill March Bo Xiao GEORGE BIROS padas.ices.utexas.edu

  2. Problem statement and contributions • FMM for kernel matrices given points in high D • FMM for SPD matrices—no points given • Four components 
 - Matrix permutations to expose low-rank structure — O(N) 
 - Compress blocks — O(N logN) 
 - Fast Matvec — O(N) 
 - HPC implementation (MPI async + ARM, x86/KNL, GPUs) 2

  3. Motivation: Kernel classification 3

  4. Motivation: arbitrary SPD matrices • Hessian matrix for large scale optimization • Schur-complement operators for computing inverse graph Laplacians • Factorization of dense covariance matrices No available geometry information (i.e., points) 5

  5. Two algorithms • ASKIT: Algebraically Skeletonized Kernel Independent Treecode • GOFMM: Geometry Oblivious Fast Multipole Method 6

  6. Highlights ASKIT CFD:12B/3D ~ 700 GB Kernels: 1B/128D ~ 1TB Malhotra, Gholami, & B’ SC14 March, Yu, Xiao, B. KDD’15 7

  7. Highlights GOFMM 8

  8. Achieving O(N log a N) complexity HIERARCHICAL NYSTROM ENSEMBLE NYSTROM MATRICES 9

  9. Sparse + low-rank 10

  10. C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> Hierarchical matrices, basic idea  K 11 � K 12 K 21 K 22  0  K 11 � � K 12 0 = + K 22 K 21 0 0 11

  11. Constructing the approximation 12

  12. Idea I: far-field —> low rank x i x j x w 13

  13. Idea II: Near/Far field split 14

  14. Idea III: recursion 1 2 3 4 15

  15. Challenges in high-dimensions • Constructing the far-field approximations 
 polynomial in ambient-D • Near-far field decomposition 
 polynomial in ambient-D • No scalable algorithms (other than Nystrom) • Nystrom method assumes low rank 
 provably not the case with increasing N 16

  16. Randomized linear algebra — Nystrom method • Low-rank decomposition of G • Random sampling of O(s) points, s: target rank 
 • Work • Error 17

  17. ASKIT • Randomized Linear Algebra — far field approximation • Parallel binary trees — permutation, partitioning • Nearest neighbors — pruning and sampling • Treecode / FMM SISC’15,16 • MPI / OpenMP / SIMD / GPU acceleration ACHA’15 • Inspired by 
 KDD’15 Ying & B. & Zorin’03 
 SC’15 Haiko & Martinsson & Tropp’11 
 Drineas & Kahan & Mahoney'06 IPDPS’15,16,17 18

  18. Far-field s-rank approximation • SVD is too expensive — use sampling • Sample rows 
 leverage, norm, range-space • Interpolative decomposition • ASKIT: approximate norm adaptive sampling using nearest-neighbors + adaptive rank selection 19

  19. Skeletonization (far field approximation) 20

  20. Evaluation 21

  21. Evaluation 22

  22. Complexity and error • Work off-diagonal • Error • Nystrom diagonal 23

  23. Parallel complexity Points per MPI task n = N Tree depth D = log N p s Tree construction ≤ ( t s + t w ) log 2 p log N + ( t w log p ) ( d + k ) n ⇣ n ⌘ s 3 s + log p Skeletonization ≤ t f Evaluation ≤ t s p + ( t w + t f ) d k s D n 24

  24. Summary of ASKIT features • Binary tree for matrix permutation • Approximate randomized nearest neighbors • Nearest neighbors for skeletonization • Bottom-up recursive low-rank approximation • Top-down pass for fast evaluation • Adaptive sampling and rank selection 25

  25. Gaussian 3D, 1M points 64D/20D intr, 1M points 26

  26. Kernel acceleration 27

  27. Nystrom vs ASKIT (8M/784D) NYSTROM ASKIT 28

  28. Kernel regression scaling MNIST dataset for OCR strong scaling, 8M points d=784 29

Recommend


More recommend