flexible hardware design at flexible hardware design at
play

Flexible Hardware Design at Flexible Hardware Design at Low Levels - PowerPoint PPT Presentation

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009 Why low-level? Why low-level? Related question: Why is some software


  1. Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of Abstraction Emil Axelsson Hardware Description and Verification May 2009

  2. Why low-level? Why low-level? Related question: Why is some software written in C? (but difference between high- and low-level is much greater in hardware) Ideal: Software-like code → magic compiler → chip masks gadget a b = case a of 2 -> thing (b+10) 3 -> thing (b+20) _ -> fixNumber a

  3. Why low-level? Why low-level? Related question: Why is some software written in C? (but difference between high- and low-level is much greater in hardware) Ideal: Software-like code → magic compiler → chip masks gadget a b = case a of 2 -> thing (b+10) 3 -> thing (b+20) _ -> fixNumber a

  4. Why low-level? Why low-level? Reality: “Ascii schematic” → chain of synthesis tools → chip masks

  5. Why low-level? Why low-level? Reality: “Ascii schematic” → chain of synthesis tools → chip masks Reiterate to improve timing/power/area/etc. Very costly / time-consuming Each fabrication costs ≈ $ 1.000.000

  6. Failing abstraction Failing abstraction Realistic flow cannot avoid low-level awareness Paradox Modern designs require higher abstraction level ...but... Modern chip technologies make abstraction harder Main problem: Routing wires are dominant in signal delays and power consumption Controlling the wires is key to the performance!

  7. Gate vs. wire delay under scaling Gate vs. wire delay under scaling Relative delay Process technology node [nm]

  8. Physical design level Physical design level Certain high-performance components (e.g. arithmetic) need to be designed at even lower level Physical level: A set of connected standard cells (implemented gates) Absolute or relative positions of cells (placement) Shape of connecting wires (routing)

  9. Physical design level Physical design level Design by interfacing to physical CAD tools Call automatic tools for certain tasks (mainly routing) Often done through scripting code Tedious Hard to explore design space Limited design reuse Aim of this work: Raise the abstraction level of physical design! Raise the abstraction level of physical design!

  10. Two ways to raise abstraction Two ways to raise abstraction Automatic synthesis + Powerful abstraction – May not be optimal for e.g. high-performance arithmetic – Opaque (hard to control the result) – Unstable (heuristics-based) Language-based techniques (higher-order functions, recursion, etc.) + Transparent, stable – Still quite low-level – Somewhat limited to regular circuits

  11. Two ways to raise abstraction Two ways to raise abstraction Automatic synthesis + Powerful abstraction – May not be optimal for e.g. high-performance arithmetic – Opaque (hard to control the result) – Unstable (heuristics-based) Language-based techniques (higher-order functions, recursion, etc.) + Transparent, stable – Still quite low-level – Somewhat limited to regular circuits Our approach

  12. Lava Lava Gate-level hardware description in Haskell Parameterized module generators : Haskell programs that generate circuits Can be smart, e.g. optimize for speed in a given environment Basic placement expressed through combinators Used successfully to generate high-performance FPGA cores

  13. Wired: Extension to Lava Wired: Extension to Lava Finer control over geometry More accurate performance models Feedback from timing/power analysis enables self-optimizing generators Wire-awareness (unique for Wired) Performance analysis based on wire length estimates Control routing through “guides” (experimental) ...

  14. Monads in Haskell Monads in Haskell Haskell functions are pure Side-effects can be “simulated” using monads Syntactic sugar, add a b = do prog = do expands to a pure as <- get a <- add 5 6 program with explicit put (a:as) b <- add a 7 state passing return (a+b) add b 8 *Main> runState prog [] (26, [18,11,5]) Monads can also be used to model e.g. IO, exceptions, Result Side-effect non-determinism etc.

  15. Monad combinators Monad combinators Haskell has a general and well-understood combinator library for monadic programs *Main> runState ( mapM (add 2) [11..13]) [] ([13,14,15],[2,2,2]) *Main> runState ( mapM (add 2 >=> add 4) [11..13]) [] ([17,18,19],[4,2,4,2,4,2])

  16. Example: Parallel prefix Example: Parallel prefix Given inputs x 1 , x 2 , … x n y 1 = x 1 compute y 2 = x 1 ∘ x 2 … y n = x 1 ∘ x 2 ∘ … ∘ x n for ∘ , an associative (but not necessarily commutative) operator

  17. Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4]

  18. Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10]

  19. Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10] Boolean OR: prefix (||) [F,F,F, T ,F, T , T ,F]

  20. Parallel prefix Parallel prefix Very central component in microprocessors Most common use: Computing carries in fast adders Trying different operators: Addition: prefix (+) [1,2,3,4] = [1, 1+2, 1+2+3, 1+2+3+4] = [1,3,6,10] Boolean OR: prefix (||) [F,F,F, T ,F, T , T ,F] = [F,F,F, T , T , T , T , T ]

  21. Parallel prefix Parallel prefix Implementation choices (relying on associativity): prefix ( ∘ ) [ x 1 , x 2 , x 3 , x 4 ] = [ y 1 , y 2 , y 3 , y 4 ] Serial: y 4 = (( x 1 ∘ x 2 ) ∘ x 3 ) ∘ x 4 y 4 = ( x 1 ∘ x 2 ) ∘ ( x 3 ∘ x 4 ) Parallel: y 4 = y 3 ∘ x 4 Sharing:

  22. There are many of them... There are many of them... Sklansky Brent-Kung Ladner-Fischer

  23. Parallel prefix: Sklansky Parallel prefix: Sklansky Simplest approach (divide-and-conquer) sklansky op [a] = return [a] sklansky op as = do Purely structural let k = length as `div` 2 (no geometry) (ls,rs) = splitAt k as' ls' <- sklansky op ls rs' <- sklansky op rs rs'' <- sequence [op (last ls', r) | r <- rs'] return (ls' ++ rs'') Could have been (monadic) Lava

  24. Refinement: Add placement Refinement: Add placement sklansky op [a] = space cellWidth [a] sklansky op as = downwards 1 $ do let k = length as `div` 2 (ls,rs) = splitAt k as' (ls',rs') <- rightwards 0 $ liftM2 (,) (sklansky op ls) (sklansky op rs) rs'' <- rightwards 0 $ sequence [op (last ls', r) | r <- rs'] return (ls' ++ rs'')

  25. Sklansky with placement Sklansky with placement Simple postscript allows interactive development of placement

  26. Refinement: Add routing guides Refinement: Add routing guides bus = rightwards 0 . mapM bus1 where bus1 = space 2750 >=> guide 3 500 >=> space 1250 sklanskyIO op = downwards 0 $ inputList 16 "in" >>= bus >>= space 1000 >>= sklansky op >>= space 1000 >>= bus >>= output "out" Reusing standard (monadic) Haskell combinators (nothing Wired-specific)

  27. Sklansky with guides Sklansky with guides

  28. Refinement: More guides Refinement: More guides sklansky op [a] = space cellWidthD [a] sklansky op as = downwards 1 $ do bus as let k = length as `div` 2 (ls,rs) = splitAt k as (ls',rs') <- rightwards 0 $ liftM2 (,) (sklansky op ls) (sklansky op rs) rs'' <- rightwards 0 $ sequence [op (last ls', r) | r <- rs'] bus (ls' ++ rs'')

  29. Sklansky with guides Sklansky with guides

  30. Experiment: Compaction Experiment: Compaction sklansky op [a] = space cellWidthD [a] Buses were compacted separately sklansky op [a] = return [a]

  31. Export to CAD tool (Cadence Soc Encounter) Export to CAD tool (Cadence Soc Encounter) Exchanged using DEF file format Auto-routed in Encounter Odd rows flipped to share power rails Simple change in recursive call: sklansky (flipY.op) ls

  32. Fast, low-power prefix networks Fast, low-power prefix networks Mary Sheeran has developed circuit generators in Lava that search for fast, low-power parallel prefix networks Initially, crude performance models Delay: Logical depth Power: Number of operators Still good results Now using Wired to improve accuracy Static timing/power analysis using models from cell library

  33. Minimal change to search algorithm Minimal change to search algorithm prefix f p = memo pm where pm ([],w) = perhaps id' ([],w) pm ([i],w) = perhaps id' ([i],w) pm (is,w) | 2^(maxd(is,w)) < length is = Fail pm (is,w) = (bestOn is f . dropFail) [ wrpC ds (prefix f p ) (prefix p p ) | ds <- igen ... ] where wrpC ds p1 p2 = wrp ds (perhaps id’ c) (p1 c1) (p2 c2) ...

  34. Minimal change to search algorithm Minimal change to search algorithm prefix f p = memo pm where pm ([],w) = perhaps id' ([],w) pm ([i],w) = perhaps id' ([i],w) pm (is,w) | 2^(maxd(is,w)) < length is = Fail pm (is,w) = (bestOn is f . dropFail) [ wrpC ds (prefix f p ) (prefix p p ) | ds <- igen ... ] where wrpC ds p1 p2 = wrp ds (perhaps id’ c) (p1 c1) (p2 c2) ... Plug in cost functions that analyze the placed network through Wired

Recommend


More recommend