Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez Bill March Bo Xiao GEORGE BIROS padas.ices.utexas.edu
Problem statement and contributions • FMM for kernel matrices given points in high D • FMM for SPD matrices—no points given • Four components - Matrix permutations to expose low-rank structure — O(N) - Compress blocks — O(N logN) - Fast Matvec — O(N) - HPC implementation (MPI async + ARM, x86/KNL, GPUs) 2
Motivation: Kernel classification 3
Motivation: arbitrary SPD matrices • Hessian matrix for large scale optimization • Schur-complement operators for computing inverse graph Laplacians • Factorization of dense covariance matrices No available geometry information (i.e., points) 5
Two algorithms • ASKIT: Algebraically Skeletonized Kernel Independent Treecode • GOFMM: Geometry Oblivious Fast Multipole Method 6
Highlights ASKIT CFD:12B/3D ~ 700 GB Kernels: 1B/128D ~ 1TB Malhotra, Gholami, & B’ SC14 March, Yu, Xiao, B. KDD’15 7
Highlights GOFMM 8
Achieving O(N log a N) complexity HIERARCHICAL NYSTROM ENSEMBLE NYSTROM MATRICES 9
Sparse + low-rank 10
C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> Hierarchical matrices, basic idea K 11 � K 12 K 21 K 22 0 K 11 � � K 12 0 = + K 22 K 21 0 0 11
Constructing the approximation 12
Idea I: far-field —> low rank x i x j x w 13
Idea II: Near/Far field split 14
Idea III: recursion 1 2 3 4 15
Challenges in high-dimensions • Constructing the far-field approximations polynomial in ambient-D • Near-far field decomposition polynomial in ambient-D • No scalable algorithms (other than Nystrom) • Nystrom method assumes low rank provably not the case with increasing N 16
Randomized linear algebra — Nystrom method • Low-rank decomposition of G • Random sampling of O(s) points, s: target rank • Work • Error 17
ASKIT • Randomized Linear Algebra — far field approximation • Parallel binary trees — permutation, partitioning • Nearest neighbors — pruning and sampling • Treecode / FMM SISC’15,16 • MPI / OpenMP / SIMD / GPU acceleration ACHA’15 • Inspired by KDD’15 Ying & B. & Zorin’03 SC’15 Haiko & Martinsson & Tropp’11 Drineas & Kahan & Mahoney'06 IPDPS’15,16,17 18
Far-field s-rank approximation • SVD is too expensive — use sampling • Sample rows leverage, norm, range-space • Interpolative decomposition • ASKIT: approximate norm adaptive sampling using nearest-neighbors + adaptive rank selection 19
Skeletonization (far field approximation) 20
Evaluation 21
Evaluation 22
Complexity and error • Work off-diagonal • Error • Nystrom diagonal 23
Parallel complexity Points per MPI task n = N Tree depth D = log N p s Tree construction ≤ ( t s + t w ) log 2 p log N + ( t w log p ) ( d + k ) n ⇣ n ⌘ s 3 s + log p Skeletonization ≤ t f Evaluation ≤ t s p + ( t w + t f ) d k s D n 24
Summary of ASKIT features • Binary tree for matrix permutation • Approximate randomized nearest neighbors • Nearest neighbors for skeletonization • Bottom-up recursive low-rank approximation • Top-down pass for fast evaluation • Adaptive sampling and rank selection 25
Gaussian 3D, 1M points 64D/20D intr, 1M points 26
Kernel acceleration 27
Nystrom vs ASKIT (8M/784D) NYSTROM ASKIT 28
Kernel regression scaling MNIST dataset for OCR strong scaling, 8M points d=784 29
Recommend
More recommend