A Parallel Method for the Computation of Matrix Exponential based on - PowerPoint PPT Presentation

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12 , V. Ariyarathna 3 , D. F. G. Coelho 1 , L. Rakai 1 , A. Madanayake 3 , R. J. Cintra 4 1 ECE Department, University of Calgary, Canada 2 Computer Modelling Group, Canada 3 ECE Department, University of Akron, USA 4 Statistics Department, Universidade Federal de Pernambuco, Brazil July 20, 2017 Coelho and Dimitrov (UofC) July 20, 2017 1 / 21

Introduction Problems in many areas require the solution of sets of linear, constant coefficient differential equations in the form: x ( t ) = Ax ( t ) = ˙ ⇒ x ( t ) = exp ( t A ) x 0 When multiple inputs are used for the same system, it might be advantageous compute the matrix exponential. Coelho and Dimitrov (UofC) July 20, 2017 2 / 21

Methods for Matrix Exponential Series expansion: ◮ Taylor; ◮ Padé; ◮ Scaling & Squaring. Newton Interpolation; Cayley-Hamilton method; Eigenvectors decomposition. Coelho and Dimitrov (UofC) July 20, 2017 3 / 21

Matrix Exponential Series Expansion The problem can be treated as the evaluation of a polynomial. Existing methods: ◮ Horner rule; ◮ Estrin method; ◮ Binary tree. Coelho and Dimitrov (UofC) July 20, 2017 4 / 21

Conventions Let A be a square matrix of size n × n . Let p N ( · ) be a polynomial of degree N − 1 over the real numbers. Let also g N ( · ) be a geometric series with N terms. Coelho and Dimitrov (UofC) July 20, 2017 5 / 21

Definition The critical path associated with the computation of a matrix polynomial p N ( A ) is the largest chain of matrix multiplications (MM) in order to evaluate p N ( A ) . Definition (Critical Path for Matrix Polynomial) Horner rule: N − 1 MM; Estrin method: 2 log 2 ( N − 1 ) MM; Binary tree: 2 log 2 ( N − 1 ) MM. Coelho and Dimitrov (UofC) July 20, 2017 6 / 21

Geometric Series Geometric series of matrix arguments can be computed efficiently with the use of different polynomial factorizations. � ( I + A 2 ) · g N / 2 ( A 2 ) , if N ≡ 0 mod 2 g N ( A ) = I + ( A + A 2 ) · g ( N − 1 ) / 2 ( A 2 ) , if N ≡ 1 mod 2 .  ( I + A + A 2 ) · g N / 3 ( A 3 ) , if N ≡ 0 mod 3 ,  I + ( A + A 2 + A 3 ) · g ( N − 1 ) / 3 ( A 3 ) , g N ( A ) = if N ≡ 1 mod 3 , I + A + ( A 2 + A 3 + A 4 ) · g ( N − 2 ) / 3 ( A 3 ) ,  if N ≡ 2 mod 3 . In general, the use of basis P demands P ′ log 2 ( N ) − 2 < 2 log 2 ( N ) − 2. Coelho and Dimitrov (UofC) July 20, 2017 7 / 21

Geometric Series In general, the use of basis P demands P ′ log 2 ( N ) − 2 < 2 log 2 ( N ) − 2. Examples: Basis 2: 2 log 2 ( N ) − 2; Basis 3: 1 . 89 . . . log 2 ( N ) − 2; Basis 5: 1 . 72 . . . log 2 ( N ) − 2; Basis 6: 1 . 92 . . . log 2 ( N ) − 2; Basis 26: 1 . 70 . . . log 2 ( N ) − 2. Coelho and Dimitrov (UofC) July 20, 2017 8 / 21

The Matrix Exponential as Several Neumann Series We write the matrix exponential truncated series expansion p N ( A ) as a linear combination of different geometric series on α k A , k = 0 , 1 , . . . , N − 1: N − 1 � p N ( A ) = g n + 1 ( α n A ) n = 0 N − 1 � n − 1 � � � α k n A k = n = 0 k = 0 N − 1 � N − 1 � � � α n · A n . = k n = 0 k = n Coelho and Dimitrov (UofC) July 20, 2017 9 / 21

The Matrix Exponential as Several Neumann Series If the coefficients of p N ( · ) are p 0 , p 1 , . . . , p N − 1 , we have he system α 0 + α 1 + α 2 + . . . + α N − 1 = p 0 α 2 1 + α 2 2 + . . . + α 2 N − 1 = p 1 α 3 2 + . . . + a 3 N − 1 = p 2 . . . α N N − 1 = p N − 1 . This system has several complex solutions that can be found by back substitution. Coelho and Dimitrov (UofC) July 20, 2017 10 / 21

A Numerical Example Small degree polynomials does not require complex solutions. Considering N = 4, we have α 0 = 0 . 451801 α 1 = 0 . 420627 α 2 = 0 . 344837 α 3 = − 0 . 217308 Coelho and Dimitrov (UofC) July 20, 2017 11 / 21

Another Numerical Example Table: Calculated coefficients for N = 9. Coefficient Value 0 . 265650069930254 α 8 0 . 270164634258582 α 7 α 6 0 . 294213807881822 α 5 0 . 320211897615876 0 . 339931939777992 α 4 0 . 312847569943447 α 3 − 0 . 803019919407973 β 1 β 0 − 0 . 046083639081852 Coelho and Dimitrov (UofC) July 20, 2017 12 / 21

A Different Approach If we modify the formulation to N − 1 N − 1 � N − 1 � � � � α n · A n . p N ( A ) = g N ( α n A ) = k n = 0 n = 0 k = 0 we obtain α 0 + α 1 + α 2 + . . . + α N − 1 = 1 N − 1 = 1 α 2 0 + α 2 1 + α 2 2 + . . . + α 2 2 . . . 1 α N − 1 + α N − 1 + α N − 1 + . . . + α N − 1 N − 1 = 0 1 2 ( N − 1 )! Coelho and Dimitrov (UofC) July 20, 2017 13 / 21

Algorithmic Example for N = 9 Pre Computation: B = A 2 , C = B 2 , broadcast A , B , C , β 0 , and β 1 Processor 0 computes H 9 ( A ) ← β 0 A + β 1 B − ( N − 4 ) I Processor 1 computes g 4 ( α 3 A ) ← ( I + α 3 A )( I + α 2 3 B ) Processor 2 computes g 5 ( α 4 A ) ← I + ( α 4 A + α 2 4 B )( I + α 2 4 B ) Processor 3 computes g 6 ( α 5 A ) ← ( I + α 5 A )( I + α 2 5 B + α 4 5 C ) Processor 4 computes g 7 ( α 6 A ) ← I + ( α 6 A + α 2 6 B )( I + α 2 6 B + α 4 6 C ) Processor 5 computes g 8 ( α 7 A ) ← ( I + α 7 A )( I + α 2 7 B )( I + α 4 7 C ) Processor 6 computes g 9 ( α 8 A ) ← I + ( α 8 A + α 2 8 B )( I + α 2 8 B )( I + α 4 8 C ) Return E 9 ( A ) = � 9 n = 3 g n + 1 ( α n A ) + H 9 ( A ) Figure: Fragment of the algorithm for computing E 9 ( A ) . Coelho and Dimitrov (UofC) July 20, 2017 14 / 21

Computing Time Trade-Off in Software Error × 10 -4 m =10 Time 10 0 2 expm time 10 -5 1 10 -10 0 × 10 -3 m =100 10 0 2 Time (s) Error 10 -5 1 10 -10 0 m =1000 10 0 0.4 10 -5 0.2 10 -10 0 1 2 3 4 5 6 7 8 9 N Figure: Illustration of the accuracy versus computing time trade-off for different values of N and m . Coelho and Dimitrov (UofC) July 20, 2017 15 / 21

Hardware Realization 4 H 9 ( A ) 4 4 S/P G 4 ( α 3 A ) a 12 a 11 4 Re−arrange 4 G 5 ( α 4 A ) A · A E 9 ( A ) a 22 a 21 Addition block 4 G 6 ( α 5 A ) 2 t = 1 t = 0 P/S 4 G 7 ( α 6 A ) 4 4 G 8 ( α 7 A ) A 2 · A 2 4 G 9 ( α 8 A ) Figure: Top level view of the implementation of the proposed algorithm. Coelho and Dimitrov (UofC) July 20, 2017 16 / 21

Hardware Realization v 22 v 12 v 21 v 11 D D u 12 u 11 w 11 w 12 u 21 u 22 D D t = 1 t = 0 w 21 w 22 Figure: Multiplication block for 2 × 2 matrices. Coelho and Dimitrov (UofC) July 20, 2017 17 / 21

Hardware Realization Results: FPGA Table: Timing and resource consumption comparison for Xilinx xc6vlx240t-ff1156 FPGA Figure of merit Horner’s Rule New Algorithm Latency 16 6 (clock cycles) Critical path 14.392 11.160 delay (ns) Slice LUTs used 24537 90993 No. of adders 48 118 No. of multipliers 32 112 Coelho and Dimitrov (UofC) July 20, 2017 18 / 21

Hardware Realization Results: ASIC Table: ASIC synthesis results Horner’s New Percentage Figure of merit Rule Algorithm Change T (ns) 3.556 1.350 ↓ 62.04 % Occupied area 1.355 4.080 ↑ 201.10% (A, mm 2 ) Dynamic power 3199.088 2403.661 ↓ 24.86% ( mW / GHz ) AT ( mm 2 · ns ) 4.8175 5.5083 ↑ 14.34% AT 2 ( mm 2 · ns 2 ) 17.1313 7.4363 ↓ 56.59% Max frequency 0.2812 0.7407 ↑ 163.40% (GHz) Latency 16 6 ↓ 62.5 % (Clock cycles) Total gate count 61009 131874 ↑ 116.15% Coelho and Dimitrov (UofC) July 20, 2017 19 / 21

Final Comments Advantages: ◮ the proposed method reduce critical path; Disadvantages: ◮ requires more processors and memory (software); ◮ requires more hardware resources such as LUT and gates (hardware). Future works: ◮ consider different combinations of Neumann series for different solutions (real solution possible?); ◮ consider more matrix functions and general polynomials; ◮ provide accurate error analysis. Coelho and Dimitrov (UofC) July 20, 2017 20 / 21

Questions? Coelho and Dimitrov (UofC) July 20, 2017 21 / 21

A Parallel Method for the Computation of Matrix Exponential based on - PowerPoint PPT Presentation

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12 , V. Ariyarathna 3 , D. F. G. Coelho 1 , L. Rakai 1 , A. Madanayake 3 , R. J. Cintra 4 1 ECE Department, University of Calgary, Canada 2

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Models of Parallel Computation Mark Greenstreet CpSc 418 Oct. 10, 2013 The RAM Model of

CSL 860: Modern Parallel Computation Computation Hello OpenMP #pragma omp parallel { // I am

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Parallel Linear Algebra Our goals: Fast and efficient parallel algorithms for the matrix-vector

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Massively Parallel Computation Philip Bille Sequential Computation Computation. Read and

Quillen metrics on modular curves Mathieu Dutour Institut de Mathmatiques de Jussieu - Paris

PostgreSQL Replication Christophe Pettus PostgreSQL Experts PerconaLive, April 25, 2018

Projected Spatial Rich Features on a GPU Andrew Ker adk @ cs.ox.ac.uk Department of Computer

Getting precise about precision Kiran S. Kedlaya ( kedlaya@mit.edu ) Department of Mathematics,

Martingale Difference Central Limit Theorem Yichen Zhou May 9, 2016 Intuition Why martingale

Tr I nc: Small Trusted Hardware for Large Distributed Systems Dave Levin University of Maryland

Distance Sampling Simulations Overview Why simulate? How it works Automated survey

Vectorized Bloom Filters for Advanced SIMD Processors Orestis Polychroniou Kenneth A. Ross