Generic Circuit Operators Jean Vuillemin École Normale Supérieure, Paris • Minimal area meets IO/Bandwidth. • Maximal IO/Bandwidth meets area. Motivations • Fit design to technology constraints. Synthesis tools to ease exploring area/speed hardware trade-offs. Jean.Vuillemin@ens.fr DCC/ETAPS02 1
Overview • Multi-media Data Flow: MPEG, FHT, … • Focus on feed-forward networks. • Generic operators: P ∪ ∩ ⊕ ⊗ + − × ,¬, , , , , , , • Synthesis driven by input types. • Types implement identical semantics. • Systematic Area/Time trade-offs. • Efficient software synthesis. Jean.Vuillemin@ens.fr DCC/ETAPS02 2
Digital Number N D N → F Z Z � � 2 � ( ) z � 2 2 2 ∑ = ∂ − Synchronous Signal: b t ( ) b ( t ) N N = Binary sequence: [ b ] b b b � N 0 1 2 N = ∈ = Integer Set: { } b { n : b 1} n ∞ = ∑ ∫ = t n z-se ries : ( b z ) b t z dt ( ) b z n ∈ N n 0 ∞ = ∑ ∫ = t n 2-adi c integer: (2 b ) b t ( )2 dt b 2 n ∈ N n 0 = 0 0 + = D 2 − N → 1 2 b 1 − = ⊕ Norm: Distance: b b ' b b ' b = 2 b 2 − − < 2 n = + 2 n b b b b b − − 0 � n 1 0 � n 1 n � Jean.Vuillemin@ens.fr DCC/ETAPS02 3
Minus x = r z u = r u − N N 1 = − y x = u 0 = ⊕ y x r − 1 = ⊕ y x r N N N = ∪ u x r = ∪ u x r N N N ∑ ∑ = = N N x x 2 r r 2 N N ∑ ∑ = = N N y y 2 u u 2 N N − = + r y x r = + − = + − y x r 2 p y x r 2 x r = − N N N N N y x = + − = + − r x r x r r 2( x r ) 2 p − − − − N N 1 N 1 N 1 N 1 Jean.Vuillemin@ens.fr DCC/ETAPS02 4
Digital Algebra D N → F N Z Z � � 2 � ( ) z � 2 2 2 D ∩ ∪ N ,¬, , is a Boolean Algebra isomorphic to the subset s o . f D ⊕ ⊗ Z ,z, , is an Integral Domain isomorphic to the power series ( z ) . 2 D + × - Z , , , is an Integral Domain isomorphic to the 2-adic integer s . 2 N F ⊂ B ⊂ N ⊂ Z ⊂ P ⊂ P ⊂ A ⊂ D ⊂ D 2 2 c Finite Integer Rational Algebraic Computable Type T implements D : 1. T supports some subset of the Digital operators. 2. For each supported operator, the semantics is that of D. T.not T.xor T.add T.input T.and T.shift T.sub T.constant T.or T.conv T.mul Jean.Vuillemin@ens.fr DCC/ETAPS02 5
Area vs. = − y x Time ∑ ∑ = = N N x x 2 y y 2 = ∪ r z x ( r ) N N z = 2 = ⊕ y x r W[1] = − y x = ∪ r x r z = 2 n W[n] 1 0 0 = ⊕ y x r 0 0 0 = + = + x x [0] 2 [1] x y y [0] 2 [1] y = ∪ r z x ( r ) ∑ ∑ = N = N x [0] x 4 y [0] y 4 0 1 1 2 N 2 N = ⊕ ∑ ∑ y x r = = x [1] x 4 N y [1] y 4 N + + 1 2 N 1 2 N 1 1 1 Jean.Vuillemin@ens.fr DCC/ETAPS02 6
Jazz http://www.exentis.com/jazz/ • Goals – High-level language for synchronous circuits – Single source from specification to synthesis – Invariant 2-adic semantics – Circuit proofs by symbolic evaluation • Means – Strong types & inference = ML, Haskell, Lava – Higher types & lazy evaluation = ML, Haskell , Lava – Objects & classes = Java , Haskell – Generic operators & overload = C++ , PamDC – Nets as a first class type > Lava, JHDL – Net-lists are not first-class < Lava, JHDL – Symbolic net-lists can be programmed = Lava, JHDL. Jean.Vuillemin@ens.fr DCC/ETAPS02 7
Example fun SumDiff (a,b) = (s,d) { Generic code s = a+b; interface d = a-b; } fun SumDiff@(a,b:T)->(s,d:T) Type T Implementation: { s = T.add(a,b); default d = T.sub(a,b); } All types implement the same 2-adic semantics. Specific type H fun SumDiff@H { implementation H(s,d)= Haddsub(H(a,b)); } SumDiff@N generates bit-level simulator. SumDiff@H generates hyper-serial circuit. As efficient as BigNum package Area > A(1)/2. Bandwidth > B(1)/2. z=1/2 N.and, … N.add, N.sub, … SumDiff@W(1) generates bit-serial circuit. SumDiff@W(4) generates nibble-serial circuit. Least area A(1). Least bandwidth B(1). z=2 Area <16A(1). Bandwidth <16B(1). z=16 SumDiff@W( ∞ ) generates an ∞ parallel Boolean Circuit Jean.Vuillemin@ens.fr DCC/ETAPS02 8
JPEG DCT software add sub mul mask cycles/px 1px / 32b 14 15 5 5 39 2px / 32b 7 8 3 4 21 T + ∈ a b T ∈ 4px / 64b 4 4 3 4 14 a b , : − ∈ T a b 0.5bit / cycle 29,5 178 24 708 T ∈ T p ( ) hardware fulladd reg cycles/px add/px P ∈ p : 1bit / cycle 59 89 12 708 × ∈ T a T p ( ) 2bit / cycle 118 45 6 708 4bit / cycle 236 22 3 708 12bit / cycle 622 0 1 622 b 0 b 1 b 2 0 0 0 0 0 c 0 c 1 c 2 0 0 0 0 0 x p 0 p 1 p 2 0 = q 0 q 1 q 2 * * * 0 0 r 0 r 1 r 2 * * * 0 0 Jean.Vuillemin@ens.fr DCC/ETAPS02 9
Linear Hough Transform L max L max { ( )} h L = L max L ∑ h (L) = p ∈ p L Jean.Vuillemin@ens.fr DCC/ETAPS02 10
Fast Hough Transform FHT fun FHT(n: int)(in : _[n]) = ht : _[n] { if (n==1) ht = in; // end of recursion else { m = n div 2; // middle point lh = FHT(m)(in[0..m-1]); // Left Histogram rh = FHT(m)(in[m..n-1]); // Right Histogram for (k<m) { // FHT Butterfly dh[k] = lh[k] << k; // Delay k lines ht[2*k] = dh[k] + rh[k]; // even Histogram ht[2*k+1] = rh[k]+dh[k]<<1;} // odd Histogram } } Jean.Vuillemin@ens.fr DCC/ETAPS02 11
FHT Circuit 1 bit Line delay: z 2 bits Pixel sum: + • Numeric 3 bits • Serial • Parallel • Symbolic => Circuit Proofs Jean.Vuillemin@ens.fr DCC/ETAPS02 12
Input Line Receiver TRT Circuit Minimize Registers: 2a+2b = 2(a+b) max(2a,2b) = 2max(a,b) Bit Reverse Max Tree Max Serial Jean.Vuillemin@ens.fr DCC/ETAPS02 13
Conclusions • Methodology – All hardware synthesis from a single source code. – All (must) implement the same 2-adic semantics. – Software synthesis from same source. • JPEG Synthesis – Compare JPEG layout for W(4), W(8) and W(12). – Dynamic instructions support Hyper-Serial implementations: half size/half rate JPEG on CHESS. • FHT Synthesis – Compare FHT layout for for W(1) thru W(8). – Uses simple symbolic simplification along synthesis: 0-fold, register swap. • Software Synthesis – Efficiency from underlying BigNum package – Limited by I/O corner turning. – Symbolic evaluation can lead to circuits proofs: periodic, algebraic, … Jean.Vuillemin@ens.fr DCC/ETAPS02 14
Recommend
More recommend