high performance machine learning for weather prediction
play

High-Performance Machine Learning for Weather Prediction - PowerPoint PPT Presentation

High-Performance Machine Learning for Weather Prediction Applications Hatem Ltaief Senior Research Scientist Extreme Computing Research Center King Abdullah University of Science and Technology NVIDIA GTC at San Jose, CA May 8-11, 2017 H.


  1. High-Performance Machine Learning for Weather Prediction Applications Hatem Ltaief Senior Research Scientist Extreme Computing Research Center King Abdullah University of Science and Technology NVIDIA GTC at San Jose, CA May 8-11, 2017 H. Ltaief 1 / 35

  2. Outline Computational Statistics for Climate/Weather Prediction Applications 1 Dense Cholesky-based Matrix Computations 2 Tile Low-Rank Cholesky-based Matrix Approximation 3 KBLAS 4 What’s Next? 5 H. Ltaief 2 / 35

  3. Computational Statistics for Climate/Weather Prediction Applications Outline Computational Statistics for Climate/Weather Prediction Applications 1 Dense Cholesky-based Matrix Computations 2 Tile Low-Rank Cholesky-based Matrix Approximation 3 KBLAS 4 What’s Next? 5 H. Ltaief 3 / 35

  4. Computational Statistics for Climate/Weather Prediction Applications Computational Statistics for Climate/Weather Prediction Applications Applications from climate and weather science often deal with a very large number of measurements regularly or irregularly located in geographical region. In geospatial statistics, these data are usually modeled as a realization from Gaussian spatial random field. This translates into evaluating the log-likelihood function, involving a large dense (but data-sparse) covariance matrix. H. Ltaief 4 / 35

  5. Computational Statistics for Climate/Weather Prediction Applications Geospatial Statistics: Learning using Cholesky Multivariate large spatial data sets in climate/weather modeling (a) Problem Definition. (b) Soil moisture. Figure: Climate/weather model. H. Ltaief 5 / 35

  6. Computational Statistics for Climate/Weather Prediction Applications Geospatial Statistics: Prediction using Schur Complement � Z 1 � � µ 1 � � Σ 11 � Σ 12 = N m + n ( ) , Σ 21 Σ 22 Z 2 µ 2 µ 1 + Σ 12 Σ − 1 Σ 11 − Σ 12 Σ22 − 1 Σ 21 � � � � Z 1 | Z 2 ≈ N m ( 22 ( Z 2 − µ 2 ) ) , H. Ltaief 6 / 35

  7. Dense Cholesky-based Matrix Computations Outline Computational Statistics for Climate/Weather Prediction Applications 1 Dense Cholesky-based Matrix Computations 2 Tile Low-Rank Cholesky-based Matrix Approximation 3 KBLAS 4 What’s Next? 5 H. Ltaief 7 / 35

  8. Dense Cholesky-based Matrix Computations Matrix Form The Cholesky factorization of an N × N real symmetric, positive-definite matrix A has the form A = LL T , where L is an N × N real lower triangular matrix with positive diagonal elements. H. Ltaief 8 / 35

  9. Dense Cholesky-based Matrix Computations LAPACK DPOTRF L A N I F UPDATE L PANEL A N I F UPDATE PANEL PANEL (a) First step. (b) Second step. (c) Third step. Figure: Block Algorithms. H. Ltaief 9 / 35

  10. Dense Cholesky-based Matrix Computations PLASMA/CHAMELEON DPOTRF Figure: Tile Algorithms. H. Ltaief 10 / 35

  11. Dense Cholesky-based Matrix Computations Mat´ ern Kernel: θ 1 1.6 1.4 1.2 θ 1 1.0 0.8 0.6 20000 40000 60000 80000 Matrix Size H. Ltaief 11 / 35

  12. Dense Cholesky-based Matrix Computations Mat´ ern Kernel: θ 2 0.20 0.15 θ 2 0.10 0.05 20000 40000 60000 80000 Matrix Size H. Ltaief 12 / 35

  13. Dense Cholesky-based Matrix Computations Mat´ ern Kernel: θ 3 0.54 0.52 0.50 θ 3 0.48 0.46 20000 40000 60000 80000 Matrix Size H. Ltaief 13 / 35

  14. Dense Cholesky-based Matrix Computations Maximum Likelihood Performance on 32 HSW cores + 8 K80 GPUs w/ StarPU 5500 5000 4500 4000 3500 Gflop/s 3000 2500 2000 1500 Haswell+8K80 Chameleon-Likelihood perf. 1000 25000 30000 35000 40000 45000 50000 55000 60000 65000 70000 75000 80000 85000 90000 95000 100000 Matrix size H. Ltaief 14 / 35

  15. Dense Cholesky-based Matrix Computations Maximum Likelihood Performance on 20 BDW cores + 8 P100 GPUs w/ StarPU DGX-1 10000 1000 Time (s) 100 10 1 0.1 100000 10000 20000 30000 40000 50000 60000 70000 80000 90000 Matrix Size H. Ltaief 15 / 35

  16. Dense Cholesky-based Matrix Computations Real Datasets w/ Mississipi Basin (Soil moisture) H. Ltaief 16 / 35

  17. Dense Cholesky-based Matrix Computations Covariance Matrix Problems Ubiquitous in computational science and engineering Symmetric, positive-definite matrix structure (Apparently) Dense matrices Often data-sparse Decay of parameter correlations with distance Hierarchically of low rank H. Ltaief 17 / 35

  18. Tile Low-Rank Cholesky-based Matrix Approximation Outline Computational Statistics for Climate/Weather Prediction Applications 1 Dense Cholesky-based Matrix Computations 2 Tile Low-Rank Cholesky-based Matrix Approximation 3 KBLAS 4 What’s Next? 5 H. Ltaief 18 / 35

  19. Tile Low-Rank Cholesky-based Matrix Approximation Matrix Rank X-ray: Hierarchically Low Rank 0 500 133 47 35 154 83 44 33 59 49 38 30 40 37 33 29 33 32 30 27 133 500 139 48 86 163 86 44 50 59 50 38 37 41 37 33 32 33 32 30 47 139 500 137 44 86 153 83 38 49 59 49 33 37 41 37 30 32 33 32 35 48 137 500 33 44 83 164 30 38 49 59 29 33 37 41 27 30 32 33 154 86 44 33 500 134 48 34 165 84 44 33 59 50 38 30 41 37 33 30 5 83 163 86 44 134 500 139 48 86 163 85 44 49 59 50 38 37 40 37 33 44 86 153 83 48 139 500 137 44 86 172 86 38 50 59 49 33 37 40 37 33 44 83 164 34 48 137 500 33 44 86 166 30 39 50 59 29 33 37 41 59 50 38 30 165 86 44 33 500 143 48 35 164 85 44 33 59 49 38 31 49 59 49 38 84 163 86 44 143 500 143 48 84 159 87 44 49 59 49 38 10 38 50 59 49 44 85 172 86 48 143 500 134 44 86 156 81 38 49 58 49 30 38 49 59 33 44 86 166 35 48 134 500 33 45 86 157 30 39 49 59 40 37 33 29 59 49 38 30 164 84 44 33 500 138 48 35 162 86 44 33 37 41 37 33 50 59 50 39 85 159 86 45 138 500 142 48 85 165 85 45 33 37 41 37 38 50 59 50 44 87 156 86 48 142 500 133 44 84 159 85 15 29 33 37 41 30 38 49 59 33 44 81 157 35 48 133 500 33 44 81 157 33 32 30 27 41 37 33 29 59 49 38 30 162 85 44 33 500 142 47 34 32 33 32 30 37 40 37 33 49 59 49 39 86 165 84 44 142 500 136 48 30 32 33 32 33 37 40 37 38 49 58 49 44 85 159 81 47 136 500 130 27 30 32 33 30 33 37 41 31 38 49 59 33 45 85 157 34 48 130 500 0 5 10 15 H. Ltaief 19 / 35

  20. Tile Low-Rank Cholesky-based Matrix Approximation Dense Linear Algebra Renaissance H. Ltaief 20 / 35

  21. Tile Low-Rank Cholesky-based Matrix Approximation HiCMA DPOTRF The low-rank tile Cholesky algorithm can be expressed with the following four computational kernels: HCORE DPOTRF: The kernel performs the Cholesky factorization of a diagonal (lower triangular) tile. It is similar to DPOTRF since the diagonal tiles are dense. HCORE DTRSM: The operation applies an update to an off-diagonal low-rank tile of the input matrix, resulting from factorization of the diagonal tile above it and overrides it with the final elements of the output matrix: V ( i , k ) = V ( i , k ) × D − 1 ( k , k ) . The operation is a triangular solve. HCORE DSYRK: The kernel applies updates to a diagonal (lower triangular) tile of the input matrix, resulting from factorization of the low-rank tiles to the left of it: ( j , k ) ) T . The operation is a D ( j , j ) = D ( j , j ) − ( U ( j , k ) × V T ( j , k ) ) × ( U ( j , k ) × V T symmetric rank- k update. HCORE DGEMM: The operation applies updates to an off-diagonal low-rank tile of the input matrix, resulting from factorization of the low-rank tiles to the left of it. The operation involves two QR factorizations, one reduced SVD (depending on the rank and/or the accuracy parameter) and two matrix-matrix multiplications. H. Ltaief 21 / 35

  22. Tile Low-Rank Cholesky-based Matrix Approximation HiCMA DPOTRF H. Ltaief 22 / 35

  23. Tile Low-Rank Cholesky-based Matrix Approximation Tile Low Rank Cholesky: Memory Footprint Akbudak et al., accepted at ISC17 H. Ltaief 23 / 35

  24. Tile Low-Rank Cholesky-based Matrix Approximation Dense Linear Algebra Renaissance H. Ltaief 24 / 35

  25. Tile Low-Rank Cholesky-based Matrix Approximation KAUST BLAS Poster@GTC17 P7223 (Ali Charara) H. Ltaief 25 / 35

  26. KBLAS Outline Computational Statistics for Climate/Weather Prediction Applications 1 Dense Cholesky-based Matrix Computations 2 Tile Low-Rank Cholesky-based Matrix Approximation 3 KBLAS 4 What’s Next? 5 H. Ltaief 26 / 35

  27. KBLAS Advanced Batched BLAS Operations: HBLAS Context: Very small sizes! Batch operation executions at each level of the tree Currently fixed sizes (need to handle variable sizes) Recursive formulation, stressing register usage Convert into batch of large GEMMs Minimize data transfer Enhance data locality Increase arithmetic intensity State-of-the-art implementations not well optimized for this scope or not supported H. Ltaief 27 / 35

  28. KBLAS Advanced Batched BLAS Operations: HBLAS HBLAS Matrix computations: Level 3 BLAS: SYRK, TRMM, TRSM Factorizations: POTRF Solves: POTRS, POSV, POTRI, POTI HBLAS Matrix compression: Batch QR factorizations Batch SVD H. Ltaief 28 / 35

  29. KBLAS Advanced Batched BLAS Operations: HBLAS Batches of Batched Rec. Batch DPOTRF Rec. Batch DTRSM Rec. Batch DSYRK Rec. Batch DPOTRF Profiling shows 76% of time is spent in batch DGEMM (MAGMABLAS). H. Ltaief 29 / 35

  30. KBLAS Performance Results: Batched Level 3 BLAS on NVIDIA K40 GPUs H. Ltaief 30 / 35

Recommend


More recommend