Performance-Influence Models of Multigrid Codes Alexander Grebhahn, Norbert Siegmund, Sven Apel University of Passau ExaStencils @ Dagstuhl April 2015 1 Alexander Grebhahn
Whic Which is h is t the he Opt Optimal imal C Configur onfigurat ation ion for a for a giv given H en Har ardw dwar are e Plat Platform? form? 2 Alexander Grebhahn
How How to to Ide Identify ntify Opti Optimal Configurat mal Configurations? ions? 350 optinal binary options lead to more configurations than the expected number of atoms in the universe Numeric options make things much worse Optimal configuration for Optimal configuration for 3 Alexander Grebhahn
What can w What can we e do? do? Use Machine Learning Pros: Automated Many tools, much research Cons: Overfitting, underfitting Not tailored to the application domain 4 Alexander Grebhahn
What can w What can we e do? do? Use Machine Learning Pros: Automated influence model Many tools, much research Cons: Overfitting, underfitting Not tailored to the application domain optimal conf. 5 Alexander Grebhahn
What can w What can we e do? do? Use Machine Learning Pros: influence model Automated Many tools, much research Cons: optimal conf. Overfitting, underfitting Not tailored to the application domain Use Domain Knowledge Pros: Knowledge about asymptotic behavior No measurement overhead Cons: Expensive, hard to incorperate Sometimes misleading 6 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π Simple example: Multigrid Solver optimal conf. With 320 configurations ( c ∈ C ) Performance-Influence Model ( Π ) Π : C → R Π ( c ) = 77 - 4.5 * GS + 24.5 * pre + 32 * GS * pre - 1.3 * pre * AMG + … 7 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π Simple example: Multigrid Solver Simple example: Multigrid Solver optimal conf. With 320 configurations ( c ∈ C ) With 320 configurations ( c ∈ C ) Performance-Influence Model ( Π ) Performance-Influence Model ( Π ) Π : C → R Π : C → R Π ( c ) = 77 - 4.5 * GS + 24.5 * pre + 32 * GS * pre Π ( c ) = 77 - 4.5 * GS + 24.5 * pre + 32 * GS * pre - 1.3 * pre * AMG + … - 1.3 * pre * AMG + … 8 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS} 9 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS}+{pre-smoothing} {MGS}+{post-smoothing} {MGS}+{GS} {MGS} {MGS}+{Jac} {MGS}+{AMG} {MGS}+{CG} 10 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS} {MGS}+{pre-smoothing} 11 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS}+{pre-smoothing}+{pre-smoothing} {MGS}+{pre-smoothing}+{post-smoothing} {MGS}+{pre-smoothing}+{GS} {MGS}+{pre-smoothing}+{Jac} {MGS}+{pre-smoothing}+{AMG} {MGS}+{pre-smoothing}+{CG} {MGS}+{pre-smoothing} {MGS}+{pre-smoothing,pre-smoothing} {MGS}+{pre-smoothing,post-smoothing} {MGS}+{pre-smoothing,GS} {MGS}+{pre-smoothing,Jac} {MGS}+{pre-smoothing,AMG} {MGS}+{pre-smoothing,CG} 12 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS}+{pre-smoothing} {MGS}+{pre-smoothing}+{GS} 13 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS}+{pre-smoothing}+{GS} +{pre-smoothing} {MGS}+{pre-smoothing}+{GS} +{post-smoothing} {MGS}+{pre-smoothing}+{GS}+{GS} {MGS}+{pre-smoothing}+{GS}+{Jac} {MGS}+{pre-smoothing}+{GS}+{AMG} {MGS}+{pre-smoothing}+{GS}+{CG} {MGS}+{pre-smoothing,pre-smoothing}+{GS} {MGS}+{pre-smoothing,post-smoothing}+{GS} {MGS}+{pre-smoothing,GS}+{GS} {MGS}+{pre-smoothing}+{GS} {MGS}+{pre-smoothing,Jac}+{GS} {MGS}+{pre-smoothing,AMG}+{GS} {MGS}+{pre-smoothing,CG}+{GS} {MGS}+{pre-smoothing}+{GS,pre-smoothing} {MGS}+{pre-smoothing}+{GS,post-smoothing} {MGS}+{pre-smoothing}+{GS,GS} {MGS}+{pre-smoothing}+{GS,Jac} {MGS}+{pre-smoothing}+{GS,AMG} {MGS}+{pre-smoothing}+{GS,CG} 14 Alexander Grebhahn
Performance Performance-Inf Influence luence Model Model influence model Π optimal conf. {MGS}+{pre-smoothing}+{GS}+{GS,pre-smoothing} 15 Alexander Grebhahn
Sampling Sampling influence model Π optimal conf. 16 Alexander Grebhahn
Sampling Sampling influence model Π optimal conf. 17 Alexander Grebhahn
Sampling Sampling influence model Π optimal conf. 18 Alexander Grebhahn
Sampling Sampling influence model Π optimal conf. 19 Alexander Grebhahn
Binar Binary y and Numeric Opt and Numeric Options ions influence model Π optimal conf. Binary Options Numeric Options (1,0) (1,0) (1,1) (1,1) (8,0) (8,8) pre-smoothing Jac GS (0,0) (0,0) (0,1) (0,1) (0,0) (0,8) post-smoothing Structured sampling approaches for the different kinds of options 20 Alexander Grebhahn
Heuris Heuristics tics for Binar for Binary y Opti Options ons influence model Π Random? Unlikely to select a valid configuration optimal conf. Only locally clustered solutions using SAT optimizations vectorize unroll colorSplitting tileOuterLoop Heuristics { } , { } … Option-Wise (OW) vectorize unroll { , , } … Negative Option-Wise (nOW) vectorize unroll tileOuterLoop { , },{ , } … vectorize unroll colorSplitting unroll Pair-Wise (PW) 21 Alexander Grebhahn
Heuris Heuristics tics for Numeric for Numeric Opti Options (Des ons (Design ign of of Expe Experim riments ents) influence model Π Response surface models Identify the influence of independent optimal conf. variables on a parameter Scale to multiple numeric options Central Composite Design (CCD) pre-smoothing post-smoothing Plackett-Burman Design (PBD) pre-smoothing post-smoothing 22 Alexander Grebhahn
Expe Experim riments ents: : Subjec Subject Sy t Syst stems ems DUNE MGS pre-smoothing post-smoothing • 2 304 configurations [0, … ,6] [0, … ,6] Dune MGS • Intel i5-4570 Quad 3 3 Number of Cells Code and 32 GB RAM [50, … ,55] preconditioner solver 50 BicGSTAB Gradient Loop GS SOR CG sum (pre-smoothing, post-smoothing) > 0 HIPAcc Padding Pixels per Thread [0,32, … ,512] HIPAcc [1,2,3,4] • 13 485 configurations 0 1 • nVidia Tesla K20 with API Texture Memory Local Memory Blocksize 5GB RAM and 2495 CUDA OpenCL Linear1D Linear2D Array2D Ldg cores 32x1 32x2 32x4 32x8 32x16 32x32 64x1 64x2 64x4 64x8 64x16 128x1 128x2 128x4 128x8 256x1 256x2 256x4 512x1 512x2 1024x1 ¬(Local Memory ˄ 1024x1 ˄ Pixel Per Thread = 2) ¬(Local Memory ˄ 128x8 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 32x32 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 256x4 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 64x16 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 512x2 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 128x8 ˄ Pixel Per Thread = 4) ¬(Local Memory ˄ 1024x1 ˄ Pixel Per Thread = 3) ¬(Local Memory ˄ 32x32 ˄ Pixel Per Thread = 4) ¬(Local Memory ˄ 256x4 ˄ Pixel Per Thread = 4) ¬(Local Memory ˄ 64x16 ˄ Pixel Per Thread = 4) ¬(Local Memory ˄ 512x2 ˄ Pixel Per Thread = 4) ¬(Local Memory ˄ 1024x1 ˄ Pixel Per Thread = 4) (Array2D Padding = 0) HSMGP • 3 456 configurations • JuQueen at Jülich pre-smoothing post-smoothing [0, … ,6] [0, … ,6] 3 HSMGP 3 Number of Cores [64,256,1024,4096] coarse grid solver smoother 64 IP_CG RED_AMG IP_AMG Jac GS GSAC RBGS RBGSAC BS sum (pre-smoothing, post-smoothing) > 0 23 Alexander Grebhahn
Expe Experim rimental ental Res Results ults Option-Wise Pair-Wise Negative Option-Wise e ¯/ |C| e ¯/ |C| ¯/ |C| e Dune MGS PBD(9,3) 14 . 1%/45 14 . 9%/72 15 . 8%/45 PBD(49,7) 11 . 4%/245 11 . 9%/392 11 . 6%/245 CCD 11 . 1%/75 11 . 9%/120 10 . 8%/75 HIPAcc PBD(9,3) 14 . 7%/240 13 . 8%/1221 49 . 3%/85 PBD(49,7) 13 . 9%/736 11 . 1%/3645 41 . 4%/161 CCD 14 . 2%/242 10 . 5%/1247 48 . 2%/102 HSMGP PBD(9,3) 2%/72 2 . 4%/162 3 . 3%/72 PBD(49,7) 2 . 1%/392 1 . 5%/882 2 . 4%/392 CCD 3 . 2%/120 2 . 7%/270 3 . 7%/120 ē: average prediction error, |C| : number of measurements PBD: Plackett-Burman Design, CCD: Central Composite Design Option-Wise is the best trade-off between prediction accuracy and measurement overhead Option-Wise combined with PBD(49,7) has best accurcy (~avg. error of 9.1%) compared to measurement overhead 24 Alexander Grebhahn
What about What about Domain Know Domain Knowledge? ledge? Learn separate models for independent configuration spaces Tailor numeric option sampling to known influence model shape of function Π 1 + Π 2 + Π 3 Tailor binary option sampling to known interactions Tailor numeric option sampling to known optimal conf. absence of interactions Learn specific functions (do not probe for any function) 25 Alexander Grebhahn
Recommend
More recommend