Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung - PowerPoint PPT Presentation

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung Park, Padma Raghavan

Laplacian Mesh Smoothing Iterative process used to improve the quality of 2D meshes. 0 Choose an internal non-visited vertex 1 Move it to the barycenter of its neighbors 2 Pick its lowest-quality non visited neighbor, GOTO 1. If set is empty, GOTO 0. GOAL: Mesh quality (edge-length ratio) is measured as: 1 min edge � | triangles | max edge triangles 1

Data Locality ◮ Data for computation is stored in cache ◮ If it is not: cache miss (additional costs) Cache are governed by Least Recently Used (LRU) algorithm. High-level view of a socket of Intel Westmere-EX processor → Measure for data: Reuse Distance Data Locality Spatial: Reuse within a cache line. Temporal: Reuse of a node already in cache 2

Data Locality in LMS Hypothesis: Cache misses play an important role in the LMS algorithm. → [Strout+Hovland 04] Data-ordering of irregular HPC applications impact the performance. Orderings: 3

Data Locality in LMS Hypothesis: Cache misses play an important role in the LMS algorithm. → [Strout+Hovland 04] Data-ordering of irregular HPC applications impact the performance. Quick check: 2.5 2.5 10 2 2 ReuseDistance (x10 ) ) 0 ReuseDistance (x10 ⁵ ) 8 1 x 1.5 ( 1.5 e c n 6 a t s 1 i D 1 e 4 s u e R 0.5 0.5 2 0 0 0 0 0.5 1 1.5 0 0 0 0.5 1 1.5 2 2.5 2 2.5 0 0.5 1 Index of access (x10 ) Index of access (x10 ) 1.5 2 2.5 Index of access (x10 ) Random ordering: Original ordering: BFS ordering: exec. exec. time 7.6s exec. time 10.3s time 6.59s 3

This work 100000 Reuse Distance 10000 ← Reuse distance profile of the LMS algorithm on a 1000 Carabiner mesh. 100 0 200 400 600 800 Time steps 4

This work 100000 Reuse Distance 10000 ← Reuse distance profile of the LMS algorithm on a 1000 Carabiner mesh. 100 0 200 400 600 800 Time steps Conjecture: Access pattern for LMS can be controlled by the initial qualities of each nodes in the mesh. A re-ordering based on the initial iteration should work well. 4

Mesh reordering scheme RDR ◮ From a given node already ordered: sort all its unordered neighbors by increasing quality ◮ Append to the list of already ordered nodes ◮ Mark the node processed. Iterate from unprocessed neighbor with worse quality. 5

Evaluation ◮ Meshes are generated by Triangle [Shewchuk’02] ◮ LMS is done with Mesquite [Brewer et al’03]. Comparison are made with respect to: ◮ ORI : original ordering given by Mesquite ◮ BFS : breadth first search ordering [Strout+Hovland’05] 6

Experimental Setup Runs done on an Intel Westmere-EX: 4 eight-cores processors (up to 32 concurrent threads). Cache Size Latency (cycles per access) L1 (P) 32K 4 L2 (P) 256K 10 L3 (S) 24M 38-170 Mem ∞ 175-290 7

Results 8 ORI BFS RDR 7 6 Execution Time: 1 core ← Results on one 5 core (seconds). 4 3 2 1 0 M1 M2 M3 M4 M5 M6 M7 M8 M9 8

Results 80 60 Mean Speedup ← Mean speedup ordering ori 40 versus T ORI (1) bfs rdr 20 0 0 10 20 30 Number of Cores 8

Cache Performance Using the PAPI software, we can measure cache performance. 60 60 0.8 ORI BFS RDR ORI BFS RDR ORI BFS RDR 0.7 50 50 0.6 40 40 Miss Rate(%) 0.5 Miss Rate(%) Miss Rate(%) 0.4 30 30 0.3 20 20 0.2 10 10 0.1 0 0 0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M1 M2 M3 M4 M5 M6 M7 M8 M9 M1 M2 M3 M4 M5 M6 M7 M8 M9 Cache performance results on one core when reorderings were applied. Better orderings will be characterized by better cache performance. Can we find better orderings (or show that we cannot)? 10

First-Order approx. By tracing all data accesses, we can measure the reuse-distance of all accesses. Assuming each node is 66 bytes 1 , in a 24MB L3 cache, misses occur for all accesses with a RD greater than 372k ( FOA ). 1 coordinates (two floats), connectivity (5/6 long) and fixed/boundary state (integer). 11

First-Order approx. By tracing all data accesses, we can measure the reuse-distance of all accesses. Assuming each node is 66 bytes 1 , in a 24MB L3 cache, misses occur for all accesses with a RD greater than 372k ( FOA ). Quantiles #accesses mesh Ordering 50% 75% 90% 100% 8 52 1,168 1,924,021 ORI carabiner 1 11 99 1,923,989 15,566,520 BFS 1 4 6 1,942 RDR 8 43 642 1,767,468 ORI crake 1 11 80 1,767,488 14,226,264 BFS 1 4 6 3,903 RDR 7 39 306 1,819,234 ORI dialog 1 10 79 1,803,850 14,614,336 BFS 1 5 11 6,198 RDR 1 coordinates (two floats), connectivity (5/6 long) and fixed/boundary state (integer). 11

FOA (II) We know: ◮ L3 misses are due to external factors ◮ We can compute the application Reuse-Distance ◮ We have access to PAPI cache misses We can estimate the “real” number of data elements that fit a cache: Assuming that there are n X LX misses, then the n X accesses with the largest reuse distance are the one that missed . 13

FOA (II) We can estimate the “real” number of data elements that fit a cache: Assuming that there are n X LX misses, then the n X accesses with the largest reuse distance are the one that missed . Estim. max number of elements (x10 3 ) mesh Ordering L1 L2 L3 13.2 21.3 330 ORI carabiner 10.2 21.2 1060 BFS 1.6 1.88 1.94 RDR 24.6 40.9 198 ORI crake 18.3 39.2 986 BFS 3.4 3.77 3.9 RDR 59 87.7 108 ORI dialog 53.2 89.3 157 BFS 5.84 6.05 6.2 RDR 13

FOA (II) We can estimate the “real” number of data elements that fit a cache: Assuming that there are n X LX misses, then the n X accesses with the largest reuse distance are the one that missed . Estim. max number of elements (x10 3 ) mesh Ordering L1 L2 L3 13.2 21.3 330 ORI carabiner 10.2 21.2 1060 BFS 1.6 1.88 1.94 RDR 24.6 198 ORI 40.9 crake 18.3 986 BFS 39.2 3.4 3.77 3.9 RDR 59 108 ORI 87.7 dialog 53.2 157 BFS 89.3 5.84 6.05 6.2 RDR 13

Reordering cost 50 40 ← Gain with scalability the gain in execution time (%) 30 performance gain is T algo ( x ) − T RDR ( x ) , for algo being 20 T algo ( x ) either ORI of BFS and x being 10 the number of cores. 0 −10 −20 1 (bfs) 1(ori) 2 (bfs) 2 (ori) 4 (bfs) 4 (ori) 8 (bfs) 8 (ori) 16 (bfs) 16 (ori) 24 (bfs) 24 (ori) 32 (bfs) 32 (ori) Number of cores Reordering is roughly the cost of one iteration of the algorithm. Basically adds you one iteration and saves you between 10 and 40%. Only worth it if you expect some iterations ( > 3 ). 14

Conclusion Reordering strategies are known to be an efficient way to improve your data-locality (and hence your pereformance). Simple conjecture: each iteration of LMS follows roughly the same execution order. ◮ Simple reordering strategy based on this; ◮ We give an intuition that it may be hard to get better reordering strategies

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung - PowerPoint PPT Presentation

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung Park, Padma Raghavan Laplacian Mesh Smoothing Iterative process used to improve the quality of 2D meshes. 0 Choose an internal non-visited vertex 1 Move it to the barycenter of

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

Scalable Laplacian K-modes Imtiaz Masud Ziko, Eric Granger and Ismail Ben Ayed Laplacian K-modes

A fundamental inequality for the p-Laplacian and the -Laplacian Yi Ru-Ya Zhang ETH Z urich

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Lecture 22: Laplacian Mesh Editing COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/5/2016

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid Paper by Sylvain

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

8.2 Surface Smoothing Hao Li http://cs621.hao-li.com 1 Mesh Optimization Smoothing Low

8.2 Surface Smoothing Weikai Chen http://cs621.hao-li.com 1 Mesh Optimization Smoothing

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian

Cosmic Calibration Katrin Heitmann Statistical Challenges for Large-Scale Structure in the Era of

Earnings Results: 4th Quarter 2013 1 | 07/27/2012 FORWARD-LOOKING STATEMENT This presentation

San Mateo County San Mateo County Department of Parks Department of Parks 2010-11 / 2011-12

Probabilistic Parsing: Issues & Improvement LING 571 Deep Processing Techniques for

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane

Problem 3-45 Design the piston rod of the cylinder at FB of the hydraulic floor crane problem

BIDDERS CONFERENCE IT -3981 Dismantling, Refurbishment, Replacement and Supply of Electrical

HWR Handling in CMTF C. Baffes HWR Transportation Review 24 July 2018 Outline Truck pulls

Sambuz

Useful Links

Newsletter

Mail Us

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung - PowerPoint PPT Presentation

Locality-Aware Laplacian Mesh Smoothing Guillaume Aupy , Jeonghyung Park, Padma Raghavan Laplacian Mesh Smoothing Iterative process used to improve the quality of 2D meshes. 0 Choose an internal non-visited vertex 1 Move it to the barycenter of

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

Scalable Laplacian K-modes Imtiaz Masud Ziko, Eric Granger and Ismail Ben Ayed Laplacian K-modes

A fundamental inequality for the p-Laplacian and the -Laplacian Yi Ru-Ya Zhang ETH Z urich

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Lecture 22: Laplacian Mesh Editing COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/5/2016

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid Paper by Sylvain

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

8.2 Surface Smoothing Hao Li http://cs621.hao-li.com 1 Mesh Optimization Smoothing Low

8.2 Surface Smoothing Weikai Chen http://cs621.hao-li.com 1 Mesh Optimization Smoothing

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian

Cosmic Calibration Katrin Heitmann Statistical Challenges for Large-Scale Structure in the Era of

Earnings Results: 4th Quarter 2013 1 | 07/27/2012 FORWARD-LOOKING STATEMENT This presentation

San Mateo County San Mateo County Department of Parks Department of Parks 2010-11 / 2011-12

Probabilistic Parsing: Issues &amp; Improvement LING 571 Deep Processing Techniques for

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane

Problem 3-45 Design the piston rod of the cylinder at FB of the hydraulic floor crane problem

BIDDERS CONFERENCE IT -3981 Dismantling, Refurbishment, Replacement and Supply of Electrical

HWR Handling in CMTF C. Baffes HWR Transportation Review 24 July 2018 Outline Truck pulls

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic Parsing: Issues & Improvement LING 571 Deep Processing Techniques for