scalable asynchronous contact mechanics using charm
play

Scalable Asynchronous Contact Mechanics using Charm++ Xiang Ni* , - PowerPoint PPT Presentation

Scalable Asynchronous Contact Mechanics using Charm++ Xiang Ni* , Laxmikant V. Kale* and Rasmus Tamstorf^ * University of Illinois at Urbana Champaign ^Walt Disney Animation Studios 1 Asynchronous Contact Mechanics 2 Asynchronous Contact


  1. Scalable Asynchronous Contact Mechanics using Charm++ Xiang Ni* , Laxmikant V. Kale* and Rasmus Tamstorf^ * University of Illinois at Urbana Champaign ^Walt Disney Animation Studios 1

  2. Asynchronous Contact Mechanics 2

  3. Asynchronous Contact Mechanics 2

  4. Asynchronous Contact Mechanics • Necessary Guarantees 2

  5. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions 2

  6. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions • Correctness: follow the laws of physics 2

  7. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions • Correctness: follow the laws of physics • Progress: finish in a finite amount of time 2

  8. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions • Correctness: follow the laws of physics • Progress: finish in a finite amount of time • Problems with other existing algorithms 2

  9. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions • Correctness: follow the laws of physics • Progress: finish in a finite amount of time • Problems with other existing algorithms • An object can end up going through itself or another object 2

  10. Asynchronous Contact Mechanics • Necessary Guarantees • Safety: no missed collisions • Correctness: follow the laws of physics • Progress: finish in a finite amount of time • Problems with other existing algorithms • An object can end up going through itself or another object • Violate physical properties 2

  11. What you get What you want pictures from Yi Wang at VT 3

  12. What you get What you want pictures from Yi Wang at VT 3

  13. What you get What you want pictures from Yi Wang at VT 3

  14. What you get What you want incorrect handling of collisions pictures from Yi Wang at VT 3

  15. Parallelization Challenges • Highly irregular communication pattern • Message driven execution in Charm++ • Dynamic load imbalancing • Adaptive runtime system • Very fine grained computation • Overlapping computation and communication 4

  16. Parallelization Challenges • Highly irregular communication pattern • Message driven execution in Charm++ 12K 50 Number of Active Contacts • Dynamic load imbalancing 10K 40 8K Core ID 30 6K • Adaptive runtime system 20 4K 10 2K • Very fine grained computation 0 0 0 5 10 15 20 25 30 Simulated Time (s) • Overlapping computation and communication 4

  17. Overall Flow 5

  18. Overall Flow Internal Force Internal Force 5

  19. Overall Flow Internal Force Internal Force Collision Detection 5

  20. Overall Flow Internal Force Collision Window Internal Force Collision Detection 5

  21. Overall Flow Internal Force Collision Window Internal Force Collision Detection Collisions Detected? 5

  22. Overall Flow Internal Force Collision Window Internal Force Collision Detection No Collisions Detected? Proceed to the next window 5

  23. Overall Flow Internal Force Penalty Force Collision Window Internal Force Penalty Force Collision Detection Yes No Collisions Detected? Add penalty Proceed to the forces and next window rollback 5

  24. Overall Flow Internal Force collision response Penalty Force Collision Window Internal Force Penalty Force Collision Detection Yes No Collisions Detected? Add penalty Proceed to the forces and next window rollback 5

  25. Collision Detection 6

  26. Collision Detection Broad Phase 6

  27. Collision Detection Broad Phase Locally inside each partition, we use a 26-DOP hierarchy to fit the swept volumes of the triangle to detect potential collisions. 6

  28. Collision Detection Broad Phase Locally inside each partition, we use a 26-DOP hierarchy to fit the swept volumes of the triangle to detect potential collisions. Globally among all the partitions, we fit the trajectory of each triangle to a 3D bounding box and then pass them to the existing collision detection library in Charm++. 6

  29. Collision Detection Broad Phase Locally inside each partition, we use a 26-DOP hierarchy to fit the swept volumes of the triangle to detect potential collisions. Globally among all the partitions, we fit the trajectory of each triangle to a 3D bounding box and then pass them to the existing collision detection library in Charm++. 6

  30. Collision Detection Narrow Phase We apply the space-time separating planes method to filter out potential collisions. 6

  31. Narrow Phase First Challenge: Computation Imbalance 7

  32. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 7

  33. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 Communication Computation 7

  34. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 Communication Computation Time spent on each potential collision pair is not uniform 7

  35. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 Communication Computation Time spent on each potential collision pair is not uniform Detection time depends on trajectory length of each vertex in the potential pair 7

  36. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 Communication Computation Time spent on each potential collision pair is not uniform Detection time depends on trajectory length of each vertex in the potential pair A profiling based load balancer 7

  37. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 Communication Computation Time spent on each potential collision pair is not uniform Detection time depends on trajectory length of each vertex in the potential pair A profiling based load balancer 7

  38. Narrow Phase First Challenge: Computation Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 810ms —> 150ms Communication Computation Time spent on each potential collision pair is not uniform Detection time depends on trajectory length of each vertex in the potential pair A profiling based load balancer 7

  39. Narrow Phase Second Challenge: Communication Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 810ms —> 150ms Communication Computation 7

  40. Narrow Phase Second Challenge: Communication Imbalance Time (ms) 0 100 200 300 400 500 600 700 800 810ms —> 150ms Communication Computation The more potential collision pairs are spread, the more communication requests. 7

  41. Narrow Phase: Communication Imbalance Locality Aware Load Balancer 8

  42. Narrow Phase: Communication Imbalance Locality Aware Load Balancer Potential Collisions Partition 2 & Partition 3 Partition 3 & Partition 4 Partition 5 & Partition 2 8

  43. Narrow Phase: Communication Imbalance Locality Aware Load Balancer Collision Tasks Partition 2 & Partition 3 Partition 3 & Partition 4 Partition 5 & Partition 2 8

  44. Narrow Phase: Communication Imbalance Locality Aware Load Balancer Collision Tasks Assignment Partition 2 & Node 3 Partition 3 Partition 3 & Node 4 Partition 4 Partition 5 Node 5 & Partition 2 Node 2 8

  45. Narrow Phase: Communication Imbalance Locality Aware Load Balancer Collision Tasks Assignment Partition 2 & Node 3 Partition 3 Partition 3 & Node 4 Partition 4 Partition 5 Node 5 & Partition 2 Node 2 8

  46. Narrow Phase: Communication Imbalance Overlapping Computation and Communication 9

  47. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node 9

  48. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node L list of potential collision tasks 9

  49. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node L list of potential collision tasks 1. Send data request for the external vertices in L 9

  50. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node L list of potential collision tasks 1. Send data request for the external vertices in L 2. On receiving message M 9

  51. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node L list of potential collision tasks 1. Send data request for the external vertices in L 2. On receiving message M M .type() 9

  52. Narrow Phase: Communication Imbalance Overlapping Computation and Communication Let’s look at the flow on one node L list of potential collision tasks 1. Send data request for the external vertices in L 2. On receiving message M M .type() Data request sendDataReply() 9

Recommend


More recommend