FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel - PowerPoint PPT Presentation

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt 2 Graphics, Capture and Massively Parallel Computing, TU Darmstadt 3 NVIDIA Research 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 1

INTRODUCTION TO LINEAR PROGRAMMING 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 2

Linear Programs min 𝑑 ⊤ 𝑦 Linear objective function s.t. 𝐵𝑦 ≤ 𝑐 Linear constraints 𝑦 ≥ 0 𝑑 𝑈 𝑏 1 𝑐 1 𝑐 = 𝐵 = where and 𝑈 𝑏 𝑛 𝑐 𝑛 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 3

Linear Programs: Applications [3P Logistics] 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 4

Lower-Level Parallelism in LP INTERNALS OF AN LP SOLVER 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 5

Solving LPs min 𝑑 ⊤ 𝑦  A is 𝑛 × 𝑜 matrix, with 𝑛 ≪ 𝑜 s.t. 𝐵𝑦 = 𝑐  A is sparse and has full row-rank . 𝑦 ≥ 0  Variables may be bounded: 𝑚 ≤ 𝑦 ≤ 𝑣 “Standard” LP format 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 6

Solving LPs Simplex Interior Point 𝑑 𝑑 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 7

Solving LPs Simplex Interior Point (IPM) 𝐵 ⊤ 𝐸 “Augmented (Newton) System” 𝐵 “Basis” (active set) “Normal 𝐵𝐸 −1 𝐵 ⊤ 𝐵 𝐶 = Equations” 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 8

Solving LPs IPM / Aug. System IPM / Normal Equations 𝐵 ⊤ 𝐸 𝐵𝐸 −1 𝐵 ⊤ 𝐵  (𝑛 + 𝑜) × (𝑛 + 𝑜) , sparse  𝑛 × 𝑛 , SPD, mi migh ght be dense se  Symmetric, indefinite  Squared condition number  Solution: Indefinite LDL T or  Solution: Cholesky-factorization MINRES method or CG method 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 9

Solving LPs IPM / Aug. System IPM / Normal Equations 𝐵 ⊤ 𝐸 𝐵𝐸 −1 𝐵 ⊤ 𝐵  (𝑛 + 𝑜) × (𝑛 + 𝑜) , sparse  𝑛 × 𝑛 , SPD, mi migh ght be dense se  Symmetric, indefinite  Squared condition number  Solution: Indefinite LDL T or  Solution: Cholesky-factorization MINRES method or CG method 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 10

Introducing culip-lp … An ongoing implementation of Mehrotra’s Primal-Dual interior point algorithm [1], featuring...  (Iterati rative ve ) Linear Algebra based on the “ Aug ugment ented ed Matrix rix ” approach,  Ful ull-ran rank guarantees,  Comprehensive pre repro proce cessi ssing & pre resc scaling aling. Towards solving large-scale LPs on the GPU as open source ce for everybody 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 11

Progress report IMPLEMENTING CULIP-LP 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 12

Solver architecture Preprocess Scale Standardize IPM loop 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 13

Solver architecture Preprocess Scale Standardize IPM loop In Input t data:  Constraints 𝐵 𝑓𝑟 𝑦 = 𝑐 𝑓𝑟  Constraints 𝐵 𝑚𝑓 𝑦 ≤ 𝑐 𝑚𝑓  Objective vector 𝑑  Bounds (on some variables) 𝑚, 𝑣 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 14

Solver architecture Preprocess Scale Standardize IPM loop Storage ge forma mat: t: CSR 𝐵 𝑓𝑟 𝑦 = 𝑐 𝑓𝑟  Compressed sparse row format 𝐵 𝑚𝑓 𝑦 ≤ 𝑐 𝑚𝑓  Provides efficient row-based access by 3 arr rrays ays: 𝑑 𝑏 𝑐 0 row_ptr 𝑚, 𝑣 0 1 1 2 0 col_Ind 𝑑 2 a b c d e val 3 𝑒 𝑓 4 5 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 15

Solver architecture Preprocess Scale Standardize IPM loop  Ex Examp mple: e: LP “ pb-simp-nonunif ” (see [2]) 𝐵 𝑓𝑟 𝑦 = 𝑐 𝑓𝑟  Input matrix: 1,4 Mio x 23k with 4,36 Mio nonzeros 𝐵 𝑚𝑓 𝑦 ≤ 𝑐 𝑚𝑓  Removed 1 singleton inequality 𝑑  Removed 3629 low-forcing constraints 𝑚, 𝑣 Execute in rounds  Removed 1 fixed variable  Removed 1,1 Mio (!) singleton inequalities  Result: approx. 3,6 6 Mio nonzeros removed 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 16

Solver architecture Preprocess Scale Standardize IPM loop Goal: : Reduce element variance in matrices 𝐵 𝑓𝑟 𝑦 = 𝑐 𝑓𝑟  Scaling [3] makes a difference 𝐵 𝑚𝑓 𝑦 ≤ 𝑐 𝑚𝑓 1. Geometric scaling (1x – 4x) 𝑑 𝐵 𝑗,⋅ 𝐵 𝑗,⋅ = max |𝐵 𝑗,⋅ | min(|𝐵 𝑗,⋅ |) 𝑚, 𝑣 2. Equilibration (1x) 𝐵 𝑗,⋅ 𝐵 𝑗,⋅ = 𝐵 𝑗,⋅ 2 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 17

Solver architecture Preprocess Scale Standardize IPM loop Goal: : Forma mat LP in in standar ard form 𝐵 𝑓𝑟 𝑦 = 𝑐 𝑓𝑟  Shift variables: l ≤ 𝑦 ≤ 𝑣 → 0 ≤ 𝑦 ′ ≤ 𝑣 + 𝑚 𝐵 𝑚𝑓 𝑦 ≤ 𝑐 𝑚𝑓 𝑑  Split (free) variables 𝑦 → 𝑦 = 𝑦 + − 𝑦 − 𝑦 + , 𝑦 − ≥ 0 𝑚, 𝑣 𝐵 𝑚𝑓 𝐽 𝑐 𝑀𝑓  Build std ’ matrix: = 𝐵 𝑓𝑟 𝑐 𝑓𝑟 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 18

Solver architecture Preprocess Scale Standardize IPM loop En Ensure re A has f full rank (sym ymbolica ically ly) 𝐵𝑦 = 𝑐 𝑑 𝑄𝐵𝑅 = 𝑛 𝑣 𝑣 𝑛 𝑑 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 19

Solver architecture Preprocess Scale Standardize IPM loop  Goal: Solve KKT conditions by Newton steps 𝐵𝑦 = 𝑐  Steps: 𝑑  Augmented matrix assembly 𝑣 𝑤 𝑑  Solv lving ing the e (ind ndef efinit inite) ) augmented mented matrix ix 𝑤 𝑞  Solv lve twice ce: predictor and corrector  Stepsize along 𝑤 = 𝑤 𝑞 + 𝑤 𝑑 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 20

Solving the augmented system Iterat rative ive stra rategy: egy:  Symmetric, indefinite: use MINRES [4] (in parts)  Equilibrate system implicitly 𝐵 ⊤ 𝐸  Preconditioner: Experiments ongoing 𝐵 Dire rect ct strate rategy: gy:  Symmetric, indefinite: use SPRAL SSIDS [5]  Reordering by METIS [6]  Scaling for large pivots 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 21

Intermediate findings PERFORMANCE EVALUATION 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 22

Benchmark problems Problem name [7] M N NNZ ex9 40,962 10,404 517,112 ex10 696,608 17,680 1,162,000 neos-631710 169,576 167,056 834,166 bley_xl1 175,620 5831 869,391 map06 328,818 164,547 549,920 map10 328,818 164,547 549,920 nb10tb 150,495 73340 1,172,289 neos-142912 58,726 416,040 1,855,220 in 1,526,202 1,449,074 6,811,639 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 23

Performance Problem name [7] NNZ CLP barr [sec] culip-lp [sec] ex9 517,112 X (NC) 81 ex10 1,162,000 X (NS) 141 neos-631710 834,166 172 478 bley_xl1 869,391 X (NS) 1,492 map06 549,920 X (NC) 466 map10 549,920 X (NC) 615 nb10tb 1,172,289 X (NC) 2,461 neos-142912 1,855,220 356 447 in 6,811,639 X (NS) NC X – failed, NS – did not start 1 st iteration, NC – did not converged within 1 hour 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 24

Runtime breakdown 10 MINRES Predictor 9 Corrector 8 7 SPRAL time [sec] 6 5 4 3 2 1 0 1 11 21 31 41 51 61 71 81 91 IPM step Example: map10 [7] Problem: map10 [7] 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 25

Iterative vs. direct methods MINRES Iterations MINRES relative residual 6000 1.E+00 1.E-01 5000 Relative Residual 1.E-02 4000 Iterations 1.E-03 3000 1.E-04 Predictor 2000 1.E-05 Corrector 1000 1.E-06 1.E-07 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13 IPM step IPM step Example: map10 [7] 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 26

Numerical difficulty Condition of matrix Remedies 𝐵 ⊤ 𝐸  2x2 pivoting in factorizations (e.g. 𝑀𝐸𝑀 ⊤ 𝐵 in SPRAL)  Preconditioning for MINRES or GMRES depends mainly on 𝐸 = 𝑒𝑗𝑏𝑕(𝑦) ⋅ 𝑒𝑗𝑏𝑕(𝑡) with strong duality towards the end often yielding where max(𝑦 𝑗 𝑡 𝑗 ) ≈ 10 10 x T =[x 1 ,…,x n ] are solution and min 𝑦 𝑗 𝑡 𝑗 s T =[s 1 ,…,s n ] are slack variables 10.05.2017 | TU Darmstadt | GCC / GSC CE | Daniel Thuerck, Maxim Naumov | 27

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel - PowerPoint PPT Presentation

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt 2 Graphics, Capture and Massively Parallel

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Linear Programming Linear Programming In a linear programming problem, there is a set of

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Parallelism ! Multiple processes concurrently Parallelism CPU1 CPU1 CPU1 Pseudo- Process 1

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk

Opportunities for Parallelism Dr. Michael K. Bane HIGH END COMPUTE Questions 1. What do you

The Use of Linear Programming in Military Operational Analysis (1968-2008) Geoff Beare UK MoD

Use of the Simplex Algorithm and Linear Systems for Theatrical Purposes By Royal Marty Problem

Mixture Problems For All Practical Chapter 4: Linear Programming Purposes Lesson Plan

Maximin and Maximal Solutions for Linear Programming Problems with Possibilistic Uncertainty

Introduction to OPL CPLEX Writing OPL Torkel A. Haufmann January 25, 2017 Introduction to OPL

On Integer Programming and Convolution Klaus Jansen Lars Rohwedder Department of Computer

Deterministic Distributed and Streaming Algorithms for Linear Algebra Problems Charlie Dickens

GARBAGE COLLECTION Collection of waste is an important logistics activity within any city. VS