parallel algorithms for solving large assignment problems
play

Parallel Algorithms for Solving Large Assignment Problems on GPU - PowerPoint PPT Presentation

Parallel Algorithms for Solving Large Assignment Problems on GPU Clusters 2018 Blue Waters Symposium Ketan Date Rakesh Nagi (PI) Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign June 6,


  1. Parallel Algorithms for Solving Large Assignment Problems on GPU Clusters 2018 Blue Waters Symposium Ketan Date Rakesh Nagi (PI) Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign June 6, 2018 1 / 30

  2. Outline Assignment Problems: Introduction and Impact Research Tasks and Role of Blue Waters The Linear Assignment Problem The Quadratic Assignment Problem 2 / 30

  3. Outline Assignment Problems: Introduction and Impact Research Tasks and Role of Blue Waters The Linear Assignment Problem The Quadratic Assignment Problem 3 / 30

  4. Introduction Assignment problems: Fundamental optimization problems in Operations Research that have prominent applications in science and engineering. Our inability to efficiently solve large instances of these problems can greatly inhibit the discovery in these domains. Objectives: Designing faster, parallel algorithms for Linear Assignment Problem (LAP) and Quadratic Assignment Problem (QAP) using GPUs and large computational clusters like Blue Waters. Future work: Extending the proposed methodology to Generalized Assignment Problem (GAP), Traveling Salesman Problem (TSP), Vehicle Routing Problem (VRP), and Graph Association/Matching (GA/GM). 4 / 30

  5. Impact of Assignment Problems 1. Data sciences: Data association in information fusion and multi-target tracking. 2. Bioinformatics: Alignment of protein-protein interaction (PPI) networks. 3. Engineering: Facility location, routing and scheduling problems, etc. Name: John Dillinger TSP and VRP Type: Person Type: Vehicle Sex: Male Make: Ford 3. 3/18/2013 -Suspected bank Model: Focus Height: 5 ’ 8 ” 1 1 Year: 2010 robber John Dillinger is wanted for Weight: 185 lbs Color: Black Age: 30 questioning by Greencastle police. V1 V Owner of Name: John 1 a1 c1 Dillinger Dillinger is said to be 30 years old, Type: Person with a height of 5'8", and weight Sex: Male Type: Vehicle V1 5 2 5 2 Height: 5 ’ 8 ” Make: Ford V2 V2 of 185 pounds. Dillinger was last Located at Age: 30 Model: Focus Located at seen driving his black 2010 Ford Weight: 185 lbs Year: 2010 Shirt color: Red Color: Black Focus westward down Owner of b1 a’ c’ Indianapolis Road. Address: Indianapolis Road 4 3 4 V2 3 Type: Location Name: Sunoco Gas Station Address: Indianapolis Road b’ Type: Location Name: Anonymous Time: 1320 Type: Person Date: 03172013 Sex: Male Type: Vehicle Height: 5 ’ 8 ” Make: Ford L1 L2 Shirt color: Red Color: Black d2 e2 a2 c2 F1 F2 Name: Anonymous Name: Anonymous Located at Type: Person Type: Person Located at Sex: Male Sex: Male Name: Sunoco Gas Station Height: 5 ’ 10 ” Height: 6 ’ 2 ” Address: Indianapolis Road Shirt color: Black b2 Shirt color: Blue Type: Location Time: 1320 Date: 03172013 F3 F4 Located at L3 L4 Located at d2 e2 Name: Anonymous Name: Anonymous Association of complex information Type: Person Type: Person from heterogeneous data sources Sex: Male Sex: Male Height: 5 ’ 10 ” Height: 6 ’ 2 ” using Graph Association Formulation Facility Location Shirt color: Black Shirt color: Blue 5 / 30

  6. Outline Assignment Problems: Introduction and Impact Research Tasks and Role of Blue Waters The Linear Assignment Problem The Quadratic Assignment Problem 6 / 30

  7. Research Tasks and Role of Blue Waters Research tasks 1. Develop GPU accelerated Hungarian algorithm for the LAP. 2. Develop GPU accelerated Dual Ascent procedure for QAP-RLT2. 3. Couple both the algorithms and deploy on Blue Waters to obtain exact solutions to the QAP in a parallel branch-and-bound scheme. Role of Blue Waters ◮ Exponential number of tree nodes in branch-and-bound. ◮ Lower bounding procedure requires solving O ( n 3 ) LAPs and adjusting O ( n 6 ) Lagrange multipliers. ◮ Solving the benchmark Nug30 QAP required over 1200 XK compute nodes for over 110 hours (15 yrs worth of computation). ◮ We are grateful to Blue Waters and the NCSA staff for providing this invaluable service to the scientific community. 7 / 30

  8. Outline Assignment Problems: Introduction and Impact Research Tasks and Role of Blue Waters The Linear Assignment Problem The Quadratic Assignment Problem 8 / 30

  9. Linear Assignment Problem: Introduction ◮ Also known as weighted bipartite matching problem. ◮ Objective: To minimize total cost of assigning n resources to n tasks. ◮ Important subproblem of many NP-Hard optimization problems, e.g., ◮ Quadratic Assignment Problem ◮ Traveling Salesperson Problem ◮ Graph Matching and Association Problems, etc. n n � � min c ij x ij ; i =1 j =1 n � s.t. x ij = 1 ∀ i = 1 , . . . , n ; j =1 n � x ij = 1 ∀ j = 1 , . . . , n ; i =1 x ij ∈ { 0 , 1 } ∀ i , j = 1 , . . . , n . 9 / 30

  10. Literature Review Sequential algorithms ◮ Hungarian algorithm [Kuhn, 1955, Munkres, 1957]. ◮ Shortest path algorithms [Jonker and Volgenant, 1987]. ◮ Auction algorithm [Bertsekas, 1990]. Parallel implementations ◮ Parallel synchronous/asynchronous Hungarian algorithms [Bertsekas and Casta˜ non, 1993]. ◮ Parallel shortest path algorithms [Balas et al., 1991, Storøy and Sørevik, 1997]. ◮ Parallel synchronous/asynchronous Auction algorithm: [Wein and Zenios, 1990, Bertsekas and Casta˜ non, 1991]. ◮ Parallel Auction algorithm using GPUs [Vasconcelos and Rosenhahn, 2009] 10 / 30

  11. Sequential Hungarian Algorithm With opportunities for acceleration Start High granularity Scalable to multiple GPUs Low Granularity Initialization Executed on single GPU Partial Assignment Optimality Check Augmentation All Assigned? End Yes No Augmenting Dual Update Path Search Yes Augmenting Path Found? No 11 / 30

  12. Accelerated Hungarian: Augmenting Path Search (Forward Pass) ◮ Goal is to find vertex disjoint augmenting paths from unassigned rows to unassigned columns. ◮ In forward pass, threads traverse the graph one hop at a time and construct augmenting trees. ◮ More than one augmenting trees may be found per iteration. ◮ Due to race condition, the trees are guaranteed to be vertex disjoint (our innovation). BFS Iteration 1 BFS Iteration 2 Augmenting path(s) found: Frontier: R2 and R3 Frontier: R1 and R4 R3-C2-R1-C1 and R3-C2-R1-C4 New Frontier: R1 and R4 New Frontier: -- R1 C1 R1 C1 R1 C1 R1 C1 R1 C1 R1 C1 Thread 21 Thread 11 R2 C2 R2 C2 R2 C2 R2 C2 R2 C2 R2 C2 C3 C3 C3 C3 C3 C3 Thread 12 R3 R3 R3 R3 R3 R3 R4 C4 R4 C4 R4 C4 Thread 22 R4 C4 R4 C4 R4 C4 12 / 30

  13. Accelerated Hungarian: Reverse Pass and Augmentation ◮ Reverse pass is performed to extract augmenting paths from the augmenting trees (each leaf vertex is processed by one thread). ◮ Due to “race” condition only one path survives per tree (our innovation). ◮ All such paths can be used to augment the current solution and increase number of assignments. Augmentation Reverse Pass Number of assignments increased by 1 Survivor Path: C1-R1-C2-R3 R1 C1 R1 C1 Thread 21 R2 C2 R2 C2 C3 C3 R3 R3 R4 C4 R4 C4 Thread 22 13 / 30

  14. Computational Experiments Experimental Setup ◮ Small Scale: n = 500 to n = 5000 in increments of 500. Cost matrix of randomly generated integers between [0 , n ]. ◮ Large Scale: n = 5000 to n = 20000 in increments of 5000. Cost matrix of randomly generated integers between [0 , n 10 ], [0 , n ], and [0 , 10 n ]. Hardware details ◮ Computational resources from Blue Waters Supercomputing Facility at University of Illinois at Urbana-Champaign. ◮ CPU: AMD Interlagos 6376, 2.30GHz clock speed, and 32GB memory. ◮ GPU: NVIDIA GK110 “Kepler” K20X, with 2688 processor cores, and 6GB memory. 14 / 30

  15. Execution Times and Speedup Profiles (Small Scale) 15 / 30

  16. Execution Times and Speedup Profiles (Large Scale) 16 / 30

  17. Contributions 1. Developed parallel versions of two variants of the Hungarian algorithm, for solving the LAP on GPU(s). 2. Accelerated algorithms leverage “race condition” to find multiple vertex-disjoint augmenting paths. 3. Single GPU variant can solve problems with up to 400 Million variables in 13 seconds. 4. Multi-GPU variant can solve problems with up to 1.6 Billion variables in 24 seconds. 17 / 30

  18. Outline Assignment Problems: Introduction and Impact Research Tasks and Role of Blue Waters The Linear Assignment Problem The Quadratic Assignment Problem 18 / 30

  19. Quadratic Assignment Problem: Introduction ◮ Introduced by [Koopmans and Beckmann, 1957] as a mathematical model for facility location. ◮ Objective: To place n facilities on n locations such that total cost (distance times flow) is minimized. ◮ Strongly NP-hard problem. No polynomial time optimal or ǫ -optimal algorithm. n n n n n n � � � � � � min b ip x ip + f ij d pq x ip x jq i =1 p =1 i =1 j =1 p =1 q =1 n � s.t. x ip = 1 ∀ p = 1 , · · · , n ; i =1 n � x ip = 1 ∀ i = 1 , · · · , n ; p =1 x ip ∈ { 0 , 1 } ∀ i , p = 1 , · · · , n . 19 / 30

Recommend


More recommend