Massively Parallel Optimization on a Cluster Environment Stratis Ioannidis
Data, Networks, and Algorithms Lab q Machine Learning q Optimization q Distributed Computing q Privacy 5000-Level Course: q Parallel Processing for Data Analytics 1 Massively Parallel Optimization on a Cluster Environment �
DNAL Research on MGHPCC Deep Learning Image Analysis Plus Pre- Plus Norm al Machine Learning for Scalable Graph Distances Retinopathy of Prematurity NSF-1741197 NSF-1622536 Garbled Circuit , , , f , , , Distributed Caching Algorithms Privacy-Preserving Machine Learning NSF-1718355 NSF-1717213, Google Research , , , , , , 2 Massively Parallel Optimization on a Cluster Environment �
Distributed Optimization and Big Data q Optimization over large datasets q TB of data q Millions of variables q 1000's of CPUs q Computational Frameworks q Map-Reduce/Spark q GraphLab q TensorFlow q MPI q … q Optimization Methods q ADMM q SGD q SDCA q … 3 Massively Parallel Optimization on a Cluster Environment �
Alternating Directions Method of Multipliers Distributed optimization and statistical learning via the alternating direction method of multipliers S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, 2011 n X min ` ( � ; x i , y i ) + � k � k 1 β ∈ R d i =1 ˆ β 1 ˆ β 2 ˆ β 3 4 Massively Parallel Optimization on a Cluster Environment �
Alternating Directions Method of Multipliers Distributed optimization and statistical learning via the alternating direction method of multipliers S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, 2011 n X min ` ( � ; x i , y i ) + � k � k 1 β ∈ R d i =1 ¯ Solve problem again, forcing agreement with β Consensus value ¯ β 5 Massively Parallel Optimization on a Cluster Environment �
ADMM properties q Converges if loss is convex ` k · k q Admits many regularization penalties q Message complexity determined by Dependence Graph dataset β data sparsity feature_1 1 feature_2 2 3 feature_3 … … … … feature_ j i feature_d-1 n-1 feature_d n 6 Massively Parallel Optimization on a Cluster Environment �
Our Research q Parallel implementation https://github.com/yahoo/SparkADMM q Application to: q Timeseries Forecasting [I., Jiang, Amizadeh, Laptev, 2016] t t + 1 q Scalable Graph Distances [I., Bento, 2017] 7 Massively Parallel Optimization on a Cluster Environment �
Frank-Wolfe Algorithm Minimize: F ( θ ) Minimize subject to: θ 2 D , to: FW Algorithm: Maximize a linear 2 function over 2 D , s k = arg min s 2 D s > · ∇ F ( θ k ) θ k + 1 = ( 1 � γ k ) θ k + γ k s k , Interpolate between solutions q Marguerite Frank & Philip Wolfe, 1956 q Sparse convex optimization q Continuous greedy algorithm for submodular maximization 8 Massively Parallel Optimization on a Cluster Environment �
Our Research [Moharrer, I., 2017] q Parallelize FW via map-reduce q Formal conditions under which MR applies q Several problems amenable to parallelization: q Experiment Design, Adaboost, Convex Approximation q Implementation over Spark q Solve problems of 10M variables in 44 mins using 210 CPUs q Serial execution would take 3.4 days https://github.com/neu-spiral/FrankWolfe 9 Massively Parallel Optimization on a Cluster Environment �
Proposal q Evaluate ADMM+FW over heterogeneous cluster architecture q Communication, computation, & memory profiling q Data partitioning q Communication q Convergence q Read/Writes: q Hard disk/RAM q Multi-tier caches 10 Massively Parallel Optimization on a Cluster Environment �
Thank You!
Recommend
More recommend