Progress with M ATLAB S ource transformation AD MSAD Rahul Kharche Cranfield University, Shrivenham R.V.Kharche@Cranfield.ac.uk AD Fest 2005, Nice 14 th - 15 th April 2005 MSAD – p. 1/18
Agenda Project Goals Previous work on MSAD Further developments Test results from MATLAB ODE examples MINPACK optimisation problems bvp4cAD Summary Future Directions References MSAD – p. 2/18
Project Goals Enhance performance by eliminating overheads introduced by operator overloading in MAD [For04] Explore MATLAB a source analysis and transformation techniques to aid AD Create a portable tool that easily integrates with MATLAB based solvers Provide control over readability of generated code Provide an array of selectable AD specific optimisations a MATLAB is a trademark of The MathWorks, Inc. MSAD – p. 3/18
Previous work on MSAD Was shown to successfully compute the gradient/Jacobian of MATLAB programs involving vector valued functions using the forward mode of AD and source transformation [Kha04] Augmented code generated by inlining the fmad class operations from MAD the derivvec class continued to hold the derivatives and perform derivative combinations resulted in a hybrid approach analogous to [Veh01] Simple forward dependence based activity analysis Active independent variables and supplementary shape size information can be provided through user directives inserted in the code MSAD – p. 4/18
Previous work on MSAD (contd.) Rudimentary size (shape) and type (constant, real, imaginary) inference Thus removed one level of overheads encountered in MAD giving discernible savings over MAD for small problem sizes but these savings grew insignificant as the problem size was increased MSAD – p. 5/18
Further developments Now uses size, type inference to specialise and further inline derivvec class operations Optionally generates code for holding and propagating sparse derivatives Incorporated sparsity inference (propagating MATLAB sparse types for derivative variables) if S implies a sparse operand and F full, then rules such as S + F → F , S ∗ F → F S. ∗ F → S , S & F → S T = S ( i, j ) → T is sparse, if i, j are vectors T ( i, j ) = S → T retains its full or sparse storage type are applied MSAD – p. 6/18
Further developments (contd.) Run-times are obtained using MATLAB 7.0 on a Linux machine with a 2.8GHz Pentium-4 processor and 512MB of RAM. MSAD – p. 7/18
previous Results - Brusselator ODE BrusselatorODE CPU(JF)/CPU(F) Vs n Jacobian Sparsity − Brussode (n = 32) 0 NUMJAC (full) 15000 MSAD (full) MAD (full) 5 5000 NUMJAC (comp) MSAD (comp) 10 2500 CPU(JF)/CPU(F) (log scale) MAD (comp) MSAD (sparse) 1200 15 MAD (sparse) 600 20 253 25 130 65 30 30 0 5 10 15 20 25 30 nz = 124 13 6 3 40 80 160 320 640 1280 2560 5120 n (log scale) 30% improvement over MAD for small n down to 4% for large n , with compression performance matches that of finite-differencing, numjac(vec) , only asymptotically MSAD – p. 8/18
previous Results - Brusselator ODE BrusselatorODE CPU(JF)/CPU(F) Vs n BrusselatorODE CPU(JF)/CPU(F) Vs n NUMJAC (full) 60 15000 MSAD (full) NUMJAC (comp) MSAD (comp) MAD (full) MAD (comp) 50 5000 NUMJAC (comp) MSAD (sparse) MSAD (comp) 2500 MAD (sparse) CPU(JF)/CPU(F) (log scale) MAD (comp) 40 MSAD (sparse) CPU(JF)/CPU(F) 1200 MAD (sparse) 600 30 253 20 130 65 10 30 13 0 40 320 640 1280 2560 5120 n 6 3 40 80 160 320 640 1280 2560 5120 n (log scale) 30% improvement over MAD for small n down to 4% for large n , with compression performance matches that of finite -differencing, numjac(vec) , only asymptotically using sparse derivatives, performance converges asymptotically to that of MAD almost exponentially increasing savings over full evaluation with increasing n MSAD – p. 8/18
Results - Brusselator ODE BrusselatorODE CPU(JF)/CPU(F) Vs n Jacobian Sparsity − Brussode (n = 32) 400 0 NUMJAC (comp) MSAD (comp) 253 5 MAD (comp) MSAD (sparse) 10 130 CPU(JF)/CPU(F) (log scale) MAD (sparse) 15 65 20 30 25 30 13 0 5 10 15 20 25 30 nz = 124 6 3 40 80 160 320 640 1280 2560 5120 10240 20480 40960 n (log scale) 91% → 30% speedup over MAD with increasing n using compression outperforms numjac(vec) n = 640 onward, with gains upto 25% MSAD – p. 9/18
Results - Brusselator ODE BrusselatorODE CPU(JF)/CPU(F) Vs n BrusselatorODE CPU(JF)/CPU(F) Vs n 400 300 NUMJAC (comp) NUMJAC (comp) MSAD (comp) 253 MSAD (comp) MAD (comp) MAD (comp) 250 MSAD (sparse) MSAD (sparse) 130 MAD (comp) CPU(JF)/CPU(F) (log scale) MAD (sparse) 200 CPU(JF)/CPU(F) 65 150 30 100 13 50 0 6 40 5120 10240 20480 40960 n 3 40 80 160 320 640 1280 2560 5120 10240 20480 40960 n (log scale) 91% → 30% speedup over MAD with increasing n using compression outperforms numjac(vec) n = 640 onward, with gains upto 25% decreasing relative speedup, but a small constant saving, over MAD using sparse derivatives MSAD – p. 9/18
Results - Burgersode ODE BurgersODE CPU(JF)/CPU(F) Vs n Jacobian Sparsity − Burgersode (n = 32) 0 NUMJAC (comp) MSAD (comp) 5 MAD (comp) 250 MSAD (sparse) 10 MAD (sparse) CPU(JF)/CPU(F) (log scale) 150 15 20 80 25 50 30 0 5 10 15 20 25 30 30 nz = 340 18 12 16 32 64 128 256 512 1024 2048 4096 8192 1638432768 n (log scale) 87% → 37% speedup over MAD with increasing n , using compression outperforms numjac n = 64 onward, with gains between 28% and 45% MSAD – p. 10/18
Results - Burgersode ODE BurgersODE CPU(JF)/CPU(F) Vs n BurgersODE CPU(JF)/CPU(F) Vs n 250 NUMJAC (comp) NUMJAC (comp) MSAD (comp) MSAD (comp) MAD (comp) MAD (comp) 250 MSAD (sparse) 200 MSAD (sparse) MAD (sparse) MAD (sparse) CPU(JF)/CPU(F) (log scale) CPU(JF)/CPU(F) 150 150 80 100 50 50 30 0 16 4096 8192 16384 32768 n 18 12 16 32 64 128 256 512 1024 2048 4096 8192 1638432768 n (log scale) 87% → 37% speedup over MAD with increasing n , using compression outperforms numjac n = 64 onward, with gains between 28% and 45% decreasing relative speedup, but a small constant saving, over MAD using sparse derivatives MSAD – p. 10/18
Results - Data Fitting problem Data−Fitting CPU(JF)/CPU(F) Vs n (m = 4) 4 10 CPU(JF)/CPU(F) (log scale) 3 10 2 10 NUMJAC (full) MSAD (full) MAD (full) MSAD (sparse) MAD (sparse) 1 10 200 400 600 800 1000 1200 1400 1600 1800 2000 n outperforms both MAD and numjac in direct evaluation of the Jacobian by > 60% MSAD – p. 11/18
Results - Data Fitting problem Data−Fitting CPU(JF)/CPU(F) Vs n (m = 4) 4 10 Vandermonde matrix Jacobian Sparsity − DataFit (n = 10, m = 4) 0 5 10 CPU(JF)/CPU(F) (log scale) 15 3 10 20 25 30 2 35 10 40 NUMJAC (full) 0 5 10 MSAD (full) nz = 30 MAD (full) MSAD (sparse) MAD (sparse) 1 10 200 400 600 800 1000 1200 1400 1600 1800 2000 n outperforms both MAD and numjac in direct evaluation of the Jacobian by > 60% if we take note of the sparsity in the Jacobian of the intermediate Vandermonde matrix [For04] and use sparse derivatives, we get an order of magnitude improvement over numjac , but a decreasing relative improvement over MAD MSAD – p. 11/18
Observations Significantly better performance using Jacobian compression compared to other methods and to numjac , MAD and the previous approach using compression , even for large n MSAD using full evaluation of the Jacobian performs well compared to MAD and numjac using full Decrease in relative performance with increasing n , when using sparse derivatives. MSAD – p. 12/18
Observations Significantly better performance using Jacobian compression compared to other methods and to numjac , MAD and the previous approach using compression , even for large n MSAD using full evaluation of the Jacobian performs well compared to MAD and numjac using full When using the full or the compressed mode, the generated code contains only native data -types qualifying it for any MATLAB JIT-Acceleration Decrease in relative performance with increasing n , when using sparse derivatives. This can be attributed to the larger overheads in manipulating the internal sparse representation of a matrix, making any savings relatively small MSAD – p. 12/18
Results - MINPACK problems MINPACK − DGL2 CPU(gf)/CPU(f) Vs n MINPACK − DSSC CPU(gf)/CPU(f) NUMJAC (full) NUMJAC (full) 2000 1500 MSAD (full) MSAD (full) MAD (full) MAD (full) 900 MSAD (sparse) MSAD (sparse) 600 MAD (sparse) MAD (sparse) CPU(gf)/CPU(f) (log scale) CPU(gf)/CPU(f) (log scale) 400 250 160 110 60 50 25 20 10 5 10 4 16 64 256 1024 4096 16384 4 16 64 256 1024 4096 16384 65536 n (log scale) n (log scale) Results from 2-D Ginzburg-Landau and Steady-state combustion problems using full derivatives to evaluate the gradient shows 80% → 50% improvement over MAD, and outperforms numjac by a similar margin over medium and large n MSAD – p. 13/18
Recommend
More recommend