is the 2nd wave of hls the one industry will surf on is
play

Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd - PowerPoint PPT Presentation

DATE 2009 PANEL SESSION DATE 2009 PANEL SESSION Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will Surf on? Jason Cong Jason Cong Chancellor s Professor s Professor Chancellor UCLA


  1. DATE 2009 PANEL SESSION DATE 2009 PANEL SESSION Is the 2nd Wave of HLS the One Industry Will Surf on? Is the 2nd Wave of HLS the One Industry Will Surf on? Jason Cong Jason Cong Chancellor’ ’s Professor s Professor Chancellor UCLA Computer Science Department UCLA Computer Science Department cong@cs.ucla.edu cong@cs.ucla.edu Chief Technology Advisor Chief Technology Advisor AutoESL Design Technologies, Inc. AutoESL Design Technologies, Inc. www.autoesl.com www.autoesl.com

  2. The Demand for High- -Level Synthesis is Real Level Synthesis is Real The Demand for High � Embedded processors are in almost every SoC Embedded processors are in almost every SoC � � Need SW/HW co � Need SW/HW co- -design and exploration design and exploration � C/C++/SystemC is a more natural starting point � C/C++/SystemC is a more natural starting point � Huge silicon capacity requires high Huge silicon capacity requires high- -level of abstraction level of abstraction � � � 700,000 lines of RTL for a 10M gate design is too much! 700,000 lines of RTL for a 10M gate design is too much! � Verification drives the acceptance of SystemC Verification drives the acceptance of SystemC � � Need executable model to verify against RTL design � Need executable model to verify against RTL design � More and more SystemC models are available � More and more SystemC models are available � Need and opportunity for aggressive power optimization Need and opportunity for aggressive power optimization � � Simultaneous functional, structural, and temporal optimization f � Simultaneous functional, structural, and temporal optimization for power. or power. � Accelerated computing or reconfigurable computing also need C/C+ Accelerated computing or reconfigurable computing also need C/C++ based + based � compilation/synthesis to FPGAs compilation/synthesis to FPGAs

  3. Opportunity for High- -Level Synthesis Level Synthesis Opportunity for High � Life of an RTL designer is getting more and more miserable Life of an RTL designer is getting more and more miserable � � Complexity (80+M gates) � Complexity (80+M gates) � Correctness � Correctness - - First First- -time working silicon ($2M mask cost) time working silicon ($2M mask cost) � Performance (interconnects dominate) � Performance (interconnects dominate) � Routability � Routability (what/how to measure at RTL level??) (what/how to measure at RTL level??) � Power (yet another dimension) � Power (yet another dimension) � … � … � Real Real opportunity opportunity for automation/exploration by high for automation/exploration by high- -level synthesis level synthesis � with BETTER quality with BETTER quality

  4. Significant Progress on HLS Significant Progress on HLS � Wide acceptance of C/C++/SystemC for design modeling and Wide acceptance of C/C++/SystemC for design modeling and � simulation simulation � Pave the way for C/C++/SystemC based HLS � Pave the way for C/C++/SystemC based HLS � Better compilation infrastructure Better compilation infrastructure � � Leveraging the progress in the compiler community � Leveraging the progress in the compiler community � Advancements of core HLS algorithms Advancements of core HLS algorithms -- -- e.g. research from UCLA: e.g. research from UCLA: � � SDC � SDC- -based scheduling based scheduling � Distributed register file based architecture � Distributed register file based architecture � Simultaneous computation and communication synthesis � Simultaneous computation and communication synthesis � Pattern � Pattern- -based synthesis based synthesis � HLS for power � HLS for power … …

  5. A New Generation of HLS Tool – – E.g. AutoESL E.g. AutoESL A New Generation of HLS Tool User Constraints User Constraints Unique ESL C/C++/SystemC C/C++/SystemC & Directives & Directives synthesis technology � Best language coverage � Best language coverage � Best language coverage � Pure ANSI C/C++ synthesis Compilation & Compilation & � Pure ANSI C/C++ synthesis Compilation & � SystemC/TLM synthesis Elaboration Elaboration Elaboration � SystemC/TLM synthesis Platform Libraries � Aggressive power optimization Advanced Code Advanced Code � Aggressive power optimization Advanced Code � Aggressive power optimization Transformation � Clock gating Transformation Transformation � Clock gating � Operation gating � Operation gating Behavior & Interface Synthesis Behavior & Interface Synthesis Behavior & Interface Synthesis � Frequency scaling Performance/Power/Area � Frequency scaling Performance/Power/Area Performance/Power/Area � Power/performance trade-off … Optimizations Optimizations � Power/performance trade-off … Optimizations � Best QoR � Best � Microarchitecture Best QoR QoR Microarchitecture Microarchitecture � Leveraging 8+ years of research Generation Generation Generation � Leveraging 8+ years of research from UCLA on ESL synthesis from UCLA on ESL synthesis � Ideal for reuse and arch-exploration � Ideal for reuse and arch � RTL Constraints RTL Constraints Ideal for reuse and arch- -exploration exploration RTL HDLs HDLs RTL � Platform-based synthesis (Timing/Layout) (Timing/Layout) � Platform-based synthesis RTL SystemC RTL SystemC � Separate source & constraint � Separate source & constraint Simulator/ Simulator/ � Link to implementation flows � Link to implementation flows Verifier Verifier ASIC/FPGA RTL Synthesis ASIC/FPGA RTL Synthesis RTL Synthesis ASIC/FPGA Place-and-Route = = Place- -and and- -Route Route Place

  6. MPEG4 4CIF by AutoPilot vs vs Manual Design Manual Design MPEG4 4CIF by AutoPilot � Frame rate Frame rate � � 60 fps based on estimation for 4CIF video � 60 fps based on estimation for 4CIF video � 200 fps on v2p board for 1CIF video � 200 fps on v2p board for 1CIF video Manual Design AutoPilot Block BRAM# MULT# SLICE# BRAM# MULT# SLICE# Parser/VLD 0 1 1700 0 2 2156 Copy Control 0 1 340 0 2 264 Motion Comp 0 1 340 0 1 262 Texture/IDCT 6 25 1710 6 19 1560 Texture Update 0 2 150 0 2 133 6 30 4240 6 26 4375 0.0% -13.3% 3.2%

  7. 1M+ Gate Wireless Communication Module by AutoPilot 1M+ Gate Wireless Communication Module by AutoPilot � Quickly generate multiple solutions with the same sample rate, b Quickly generate multiple solutions with the same sample rate, but ut � different area/power profiles different area/power profiles � Manual design took 4 months while C Manual design took 4 months while C- -based synthesis using based synthesis using � AutoPilot in two weeks AutoPilot in two weeks Architecture Latency Area (mm^2) Clock (MHz) manual 96 1.17 150 config1 116 1.10 150 config2 86 1.12 150 config3 81 1.30 100 config4 64 1.55 75 TSMC65nmLP Library TSMC65nmLP Library

  8. Next Challenges Next Challenges � Even better Even better QoR QoR, out , out- -of of- -box success box success � � Further algorithmic innovation for HLS � Further algorithmic innovation for HLS � Aggressive power optimization Aggressive power optimization � � Physical synthesis above RTL Physical synthesis above RTL � � Integrated synthesis and verification Integrated synthesis and verification � � Synthesis support for variability and reliability Synthesis support for variability and reliability �

Recommend


More recommend