ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting Rahul Mazumder Yao Pan LinkedIn AI LinkedIn AI MIT LinkedIn AI
1 Overview 2 ECLIPSE: Extreme Scale LP Solver Agenda 3 Applications 4 System Architecture 5 Experimental Results
Overview
Introduction Large-Scale Linear Programs (LP) has several applications on web
Problems of Extreme Scale Billions to Trillions of Variables ● Ad-hoc Solutions ● Splitting the problem to smaller sub-problem à No guarantee of optimality ● Exploit the Structure of the Problem ● Solve a Perturbation of the Primal Problem. ● Smooth Gradient ● Efficient computation ●
Motivating Example Friend or Connection Matching Problem Maximize Value ● Total invites sent is greater than a threshold ● Limit on invitations per member to prevent ● overwhelming members 𝑞 ! - Value Model Scale: ● 𝑞 " - Invitation Model 𝐽 ≈ 10 % ● • 𝑦 #$ - Probability of showing user j to user i 𝐾 ≈ 10 & ● • 𝑜 ≈ 10 !" • ( 1 Trillion Decision Variables)
General Framework c T x Users 𝑗 , Items 𝑘 , and 𝑦 #$ is the association min ● between (𝑗, 𝑘) x 𝑜 = 𝐽𝐾 can range in 100s of millions to 10s of trillions s.t. Ax b ● 𝐷 # are simple constraints (i.e. allows for efficient ● x i 2 C i , i 2 [ I ] projections) Global Constraints ✓ A (1) Cohort Level Constraints A ✓ Eg: Total Invite Constraint 0 1 D 11 . . . D 1 I . . A (2) Item level constraints . . = B C . . · · · @ A Eg: Limits on invitation per user D m 2 1 . . . D m 2 I
ECLIPSE: Extreme Scale LP Solver
Solving The Problem c T x min s.t. x i 2 C i , i 2 [ I ] P ∗ Ax b, 0 := Primal LP: x Old idea: Perturbation of the LP (Mangasarian & Meyer ’79; Nesterov ‘05; Osher et al ‘11…) c T x + γ 2 x T x P ∗ min s.t. x i 2 C i , i 2 [ I ] γ := Ax b, Primal QP: x Dualize c T x + γ n 2 x T x + λ T ( Ax � b ) o g γ ( λ ) := min Dual QP: x ∈ Q C i length ( λ ) is small Key Observation: = P ∗ = max λ ≥ 0 g γ ( λ ) g ∗ γ := Solve the Dual QP: γ Strong duality
Solving The Problem c T x min s.t. x i 2 C i , i 2 [ I ] Ax b, P ∗ 0 := Primal: x c T x + γ 2 x T x x ∗ s.t. γ 2 argmin x i 2 C i , i 2 [ I ] Ax b, x | − • Observation-1: Exact Regularization (Mangasarian & Meyer ’79; Friedlander Tseng ‘08) γ > 0 such that x ∗ ∃ ¯ γ solves LP for all γ ≤ ¯ γ c T x + γ n 2 x T x + λ T ( Ax � b ) o g γ ( λ ) := min Dual: x ∈ Q C i g ∗ = max λ ≥ 0 g γ ( λ ) γ := • Observation-2: Error Bound (Nesterov ‘05) | g ∗ γ − P ∗ 0 | = O ( γ )
Solving The Problem = max λ ≥ 0 g γ ( λ ) ECLIPSE Algorithm • Proximal Gradient Based methods • Observation-1: Dual objective is smooth (implicitly defined) (Acceleration, Restarts) [Nesterov ‘05] • Optimal convergence rates. λ 7! g γ ( λ ) is O (1 / γ ) -smooth. n o • Observation-2: Gradient expression (Danskin’s Theorem) Q c T x + γ n 2 x T x + λ T ( Ax � b ) o r g γ ( λ ) = A ˆ x ( λ ) � b x ( λ ) 2 argmin ˆ n x ∈ Q C i ✓ ◆ � 1 γ ( A T λ + c ) i x i ( λ ) = Π C i ˆ n • Key bottleneck: Matrix-vector multiplication • Simple projection operation
Overall Algorithm Input: Get Primal: At Iteration k: Dual Compute Gradient: Next Iteration Update Dual: GD: AGD:
Applications
Volume Optimization Maximize Sessions Total number of emails / ● notifications bounded Clicks above a threshold ● Disablement below a threshold ● Generalized from global to cohort level systems and member level systems
Multi-Objective Optimization Maximize Metric 1 ● Metric 2 is greater than a ● minimum Metric 3 is bounded ● … ● Most Product Applications ● Engagement vs Revenue ● Sessions vs Notification / ● Email Volume Member Value vs Annoyance ●
System Infrastructure
System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑
System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑 The solver is called which runs the overall • iterations. The data is split into multiple executors and • they perform matrix vector multiplications in parallel The driver collects the dual and broadcasts • it back to continue the iterations
System Architecture Data is collected from different sources • and restructured to form Input 𝐵, 𝑐, 𝑑 The solver is called which runs the overall • iterations. The data is split into multiple executors and • they perform matrix vector multiplications in parallel The driver collects the dual and broadcasts • it back to continue the iterations On convergence the final duals are • returned which are used in online serving
Detailed Spark Implementation Data Representation Estimating Primal Estimating Gradient • Customized DistributedMatrix • • Component wise Matrix Most computationally API Multiplications and expensive step to get Projections are done in • • : BlockMatrix API from The worst-case complexity is parallel 𝑃 𝑜 = 𝐽𝐾 Apache MLLib We cache 𝐵 in executor and • • : Leverage Diagonal broadcast duals to minimize structure and implement communication cost. DistributedVector API using • RDD (index, Vector) The overall complexity to get the primal is 𝑃(𝐾)
Experimental Results
Comparative Results We compare with a technique of • splitting the problem (SOTA): Please see the full paper for other comparisons
Real Data Results Test on large-scale volume • optimization and matching problems Spark 2.3 with up to 800 • executors 1 Trillion use case • converged within 12 hours SCS: O’Donoghue et al (2016)
Key Takeaways
Key Takeaways A framework for solving structured LP problems arising in several applications • from internet industry Most multi-objective optimization can be framed through this. • Given the computation resources, we can scale to extremely large problems. • We can easily scale up to 1 Trillion variables on real data. •
Thank you
Recommend
More recommend