VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16 Rudolf Mak TU/e Computer Science Systolic 1
Agenda • Systolic arrays (what, where) • Regular Iterative Algorithms (RIAs) • Dependence graphs (regular, reduced) • Systolic design techniques – Binding (computations to PEs) – Scheduling (computations to time slots) • Examples – Fir filters, matrix multipliers 18-May-16 Rudolf Mak TU/e Computer Science Systolic 2
FSM reminder Moore machine Mealy machine CL CL state state Chaining Mealy machines may lead too long critical paths! 18-May-16 Rudolf Mak TU/e Computer Science Systolic 3
Systolic system (Leiserson) A systolic system is a set of interconnected Moore machines that operate synchronously and satisfy certain smallness (boundedness) conditions : 1. # states is bounded 2. # input ports is bounded 3. # output ports is bounded 4. # neighbor machines is bounded “#” stands for “number of” 18-May-16 Rudolf Mak TU/e Computer Science Systolic 4
Systolic = Uniform Pipelined SDF • Uniform: – Each PE (Moore machine) computes the same set of combinatorial functions. • Regular: – All PEs are connected to a small finite number of neighboring PEs via one or more D-elements according to a regular topology. All connections are point-to-point connections. • Synchronous operation: – All PEs operate in lock step (fire concurrently) ; data is pumped through the system, much like the hart pumps blood through the body (hence the name systolic). 18-May-16 Rudolf Mak TU/e Computer Science Systolic 5
Relaxations • To obtain better systems small relaxations to the systolic model are allowed: 1. Not all PEs are identical, small deviations are allowed especially for PEs at the border of the system. 2. (A limited form) of broadcasting is allowed. This means that PEs have become Mealy machines. 1. These systems are called semi-systolic by Leiserson. 2. Parhi does not make the distinction. Instead he uses the notion fully pipelined for the Moore machine variant. 3. Connections need not be to nearest neighbors, but locality needs to be maintained. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 6
Such as a Systolic system Power PC on a FPGA Host Turing-equivalent machine PE PE PE PE PE Systolic array: Moore machines Such as a dedicated computing engine on a FPGA 18-May-16 Rudolf Mak TU/e Computer Science Systolic 7
Application areas • Computationally intensive, regular – Basic linear algebra operations – Signal processing – Image processing – Order statistics, sorting – Dynamic programming – High performance computing • e.g., many particle simulations (in chemistry, physics or astronomy) 18-May-16 Rudolf Mak TU/e Computer Science Systolic 8
FIR filter (N-tap) Spec � � � � � � � � � � , 0 � � ����� � �, � � � � � � � � � � � , 0 � � � � ����� � �, 0 � ���� RIA � �, � � � � � � � � � � � � � 1 � � � 1 � � ������� � � 1 � � � 1 � � � � � � ��� � 1, � � 1� does not work!!! � �, � � 0 � �, � � � � � ��� � 1, �� or � ���, � � 1� � �, � � � � � ���, � � 1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 9
Regular Iterative Algorithm ���, �� is input A RIA is a triple consisting of { ��, �� | 0 � �, 0 � � � � 1. An index space ! �, �, � 2. A finite set of variables 3. A set of direct dependencies among indexed variables (given as equalities) • with associated index displacement vectors • also called fundamental edges by Parhi Canonical forms : 1. Standard input 2. Standard output 18-May-16 Rudolf Mak TU/e Computer Science Systolic 10
FIR-filter: RIA description Standard output canonical form: � ��, �� � � ��, �� � ��, �� � � �� � 1, � � 1�, ���, �� � 0 � 1, �1 � ��, �� � � �� � 1, �� , ���1, �� � ���� = �, � � ��, �� � � ��, � � 1� ���, �1� � ���� � � ��, � � 1� ���, �� � Index displacement vectors: LHS = RHS + IDV � → � � → � � → � � → � � → � �#�$ → %�$� �0, 1� �1, 0� �0, 0� �0, 0� �1, �1� �0, �1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 11
Computational node ( & ( � + 1 �&, '� node g ) � + 2 �&, '� * � + 3 �&, '� ) + 1 . � + 2 + 3 * ' 18-May-16 Rudolf Mak TU/e Computer Science Systolic 12
Computational node from RIA ��� � 1, � � 1� ���, �� I( g ) ��� � 1, �� ���, �� I( g ) is the index vector, i.e., the sequence of ���, �� ���, � � 1� coordinates of g in index-space � ��, �� � � ��, �� � ��, �� � � �� � 1, � � 1� 18-May-16 Rudolf Mak TU/e Computer Science Systolic 13
Dependence graphs 1. The nodes of a dependence graph represent ( small ) computations . There is a separate node for each com- putation. 2. The edges of a dependence graph represent causal dependencies between computations, i.e., an edge from node � to node � indicates that the result of the computation performed by � is used in the computation performed by � . 3. There is no notion of time in a dependence graph. It is an (index-)space representation. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 14
FIR: Dependence graph ���� � ��0����� � ��1���� � 1� � ��2���� � 2� x(0) x(1) x(2) x(3) x(4) h(2) h(1) � h(0) y(0) y(1) y(2) y(3) y(4) � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 15
FIR: Dependence graph ���� � ��0����� � ��1���� � 1� � ��2���� � 2� x(0) x(1) x(2) x(3) x(4) 0 0 0 0 0 h(2) 0 h(1) 0 � h(0) y(0) y(1) y(2) y(3) y(4) � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 16
Regular dependence graphs A dependence graph / is regular when: 1. There is a injective mapping 0 from the nodes of / to a grid of points in the � - dimensional index space. 2. There exists a finite set 1 of vectors, called fundamental edges , such that every pair ��, �� of neighboring nodes is mapped to a pair of grid locations that differ by a fundamental edge 2 ∈ 1 , i.e., 0 � � 0 � � 2 . 18-May-16 Rudolf Mak TU/e Computer Science Systolic 17
FIR: DG in space representation x(0) x(1) x(2) x(3) x(4) h(2) h(1) (1,-1) (0,1) h(0) (1,0) y(0) y(1) y(2) y(3) y(4) 1 � 2 4 2 5 |2 6 � � 1 0 1 fundamental edges 0 1 �1 18-May-16 Rudolf Mak TU/e Computer Science Systolic 18
Systolic array design The design of a systolic array for a computation given in the form of a regular dependence graph involves: 1. Choosing a processor space, i.e., a set of dimensions and a number of PEs per dimension (the array). 2. Mapping each computational node of the graph to a PE of the array. Similar to folding 3. For each PE scheduling the computations of the nodes mapped onto it, i.e., assigning each individual computation to a distinct time slot. 18-May-16 Rudolf Mak TU/e Computer Science Systolic 19
Design parameters An �� � 1 )-dimensional systolic design for an � -dimensional regular dependence graph is characterized by: A � 7 �� � 1� processor space matrix 8 : 1. : 0��� is the processor that executes node � 9 A � -dimensional scheduling vector ; : 2. : 0��� is the time slot at which node x is executed < A projection (iteration) vector = : 3. : 0��� � 9 @ 0��� 0��� – 0��� � ? = implies 9 18-May-16 Rudolf Mak TU/e Computer Science Systolic 20
Design constraints • Computations whose grid locations differ by a multiple of the projection vector execute on the same PE : 0��� � 9 @ 0��� – 0��� – 0��� � ? = implies 9 : = � 0 9 – hence • Computations that execute on the same PE must be scheduled in different time slots : 0��� is the time slot at which node � is – < executed : = A 0 – hence ; 18-May-16 Rudolf Mak TU/e Computer Science Systolic 21
B : � �0, 1� Processor allocation: ) : � �1, 0� x(0) x(1) x(2) x(3) x(4) h(2) processors h(1) h(0) y(0) y(1) y(2) y(3) y(4) B : � � � � 18-May-16 Rudolf Mak TU/e Computer Science Systolic 22
C : � �1, 0� Scheduling: ) : � �1, 0� x(0) x(1) x(2) x(3) x(4) 1 2 4 0 3 h(2) 1 2 4 0 3 h(1) 1 2 4 0 3 h(0) y(0) y(1) y(2) y(3) y(4) C : � � � � time 18-May-16 Rudolf Mak TU/e Computer Science Systolic 23
Recommend
More recommend