The Future of EDA: The Future of EDA: The Future of EDA: The Future of EDA: Methodology, Tools Methodology, Tools and Solutions and Solutions d S l ti d S l ti Sharad Malik Sharad Malik Princeton University Princeton University ceto ceto U U e s ty e s ty NSF Future of EDA Workshop NSF Future of EDA Workshop July 8-9, 2009 July 8-9, 2009
Essence of EDA • Tools follow methodology • ASIC Design Methodology – Standard Cells Standard Cells – Synchronous Timing Source: vlsitechnology.org Defined sub-problems based on what needed to be solved, and what could be reasonably solved • Tools support methodology gy – Provide • Design productivity • Design quality es g qua ty Source: chipdesignhome.com
Design in the Late- and Post- Design in the Late and Post Silicon Era Our Charter – Enable Moore’s Law • Reduce cost/unit-function • Functionality includes all aspects of design quality – power, performance, reliability, usability • Significant threats to all aspects of reducing cost and increasing functionality – Design verification and test – Staying within power budgets – Reliable designs on unreliable fabrics – Usability through efficient programmability
Moore’s Law and Design Verification Moore s Law and Design Verification Moore’s Law: Growth rate of transistors/IC is exponential – Corollary 1: Growth rate of state bits/IC is exponential – Corollary 2: Growth rate of state space (proxy for complexity) is Corollary 2: Growth rate of state space (proxy for complexity) is doubly exponential But… – Corollary 3: Growth rate of compute power is exponential C G f Thus… – Growth rate of complexity is still doubly exponential relative to Growth rate of complexity is still doubly exponential relative to our ability to deal with it Design methodology must adapt to deal with this.
Possible Solution Direction: Possible Solution Direction: Runtime Validation • Increasingly need to reconcile ourselves to the fact that hardware like software will be shipped with bugs ith b • Runtime validation (through error detection and recovery) offers a potentially scalable solution recovery) offers a potentially scalable solution – Provide robustness in the face of inevitable bug escapes escapes • Significantly reduce verification costs – Verify chips “to life” rather than “to death” y p
Solution Direction: Runtime Validation Transient Faults due to Transient Faults due to Parametric Variability Parametric Variability Cosmic Rays & Alpha Particles Cosmic Rays & Alpha Particles (Uncertainty in device and environment) (Uncertainty in device and environment) (Increase exponentially with (Increase exponentially with Intra die variations in ILD thickness Intra-die variations in ILD thickness n mber of de ices on chip) n mber of de ices on chip) number of devices on chip) number of devices on chip) Figure Source: T. Austin • Dynamic errors which occur at runtime • Will need runtime solutions • Will need runtime solutions • Combine with runtime solutions for functional errors (design bugs)
Example: Checking Memory Consistency [D. Shasha et al. , TOPLAS’88] [H. W. Cain et al. , PACT’03] • A directed graph that models memory ordering constraints – Vertices : dynamic memory instruction instances – Vertices : dynamic memory instruction instances – Edges : • Consistency edges A cycle in the graph indicates a A cycle in the graph indicates a memory ordering violation memory ordering violation y y g g • Dependence edges P1 P2 P1 P2 P1 P2 P1 P2 P1 P2 P1 P2 ST A ST A ST A ST A ST A ST A ST A ST A ST A ST A ST A ST A LD A LD A LD A LD A LD A LD A ST B ST B ST B ST B ST B ST B ST A ST B MB MB S ST B ST A S ST B ST B ST A ST A LD D LD D LD D LD D LD D LD B LD B LD D LD C LD C LD C LD C LD C LD C ST C ST C ST C ST C ST C ST C ST A ST A ST A ST A ST A ST A LD A LD A LD A LD A LD A LD A Sequential Consistency Total Store Ordering Weak Ordering
Extensions for Transactional Memory • Extended constraint graph for transaction semantics – Non-transactional code assumes Sequential Consistency TransOpOp : P1 P2 [ Op 1; Op 2] => Op 1 ≤ Op 2 LD A LD A LD A TransMembar : ST B TStart Op 1; [ Op 2] => Op 1 ≤ Op 2 TStart [ Op 1]; Op 2 => Op 1 ≤ Op 2 [ O 1] O 2 O 1 O 2 ST C LD C ST D LD D TransAtomicity : TransAtomicity : TEnd TEnd [ Op 1; Op 2] ¬ [ Op 1; Op ; Op 2] LD B => ST A ( Op ≤ Op 1) ( Op 2 ≤ Op ) ST F LD E
On-the-fly Graph Checking DFS DFS search based cycle DFS search based cycle DFS h b h b d d l l Processor Processor checker for sparse graphs checker for sparse graphs Processor Processor Processor Core Processor Core Processor Core Processor Core Core Core Core Core Central Central L Local Local L l l Local Local L L l l Local Local Local Local Graph Graph Observer Observer Observer Observer Observer Observer Observer Observer L1 Cache L1 Cache L1 Cache L1 Cache Checker Checker L1 Cache L1 Cache L1 Cache L1 Cache Cache Controller Cache Controller Cache Controller Cache Controller Cache Controller Cache Controller Cache Controller Cache Controller Interconnection Network Interconnection Network Interconnection Network Interconnection Network L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache L2 Cache L2 C L2 C h h • Local observer: • Central checker: - Local instruction ordering - Build the global constraint graph Build the global constraint graph - Local access history - Check for the acyclic property - Locally observed inter-processor edges
P Practical Design Challenges ti l D i Ch ll A naively built constraint graph that includes all executed memory instructions Billions of vertices Billions of vertices Unbounded graph size
Key Enabling Techniques Graph Graph Enables checking of graphs of a few Enables checking of graphs of a few Reduction Slicing hundred vertices every 10K cycles hundred vertices every 10K cycles hundred vertices every 10K cycles hundred vertices every 10K cycles
Runtime Validation: Runtime Validation: Key Advantages • Common framework for a range of defects • Manage pre-silicon verification costs – Have predictable verification schedules p – Support bug escapes through runtime validation • Complexity, Performance Tradeoffs – Common mode Common mode • High performance, high complexity – (Infrequent) Recovery mode • Low complexity, low performance • Leverage check-pointing support – Backward error recovery through rollback – Relevant for high-performance to support speculation g p pp p
Pre-Silicon vs. Runtime Validation • Complementary Strengths – Large state space • Pre-silicon: Incomplete formal verification, simulation • Runtime: Easy - observe only actual state – State observability State observability • Runtime: Challenging to observe – Distributed state, large number of variables • Pre Silicon: Easy • Pre-Silicon: Easy – just variables in software models for just variables in software models for simulation or formal verification
Future Challenges • Keep costs low, with increasing complexity and failure modes • A discipline for runtime validation? • A discipline for runtime validation? – Mature from one-off solutions to a general methodology – General checking and recovery mechanisms • Checking – Design assertions • Recovery – Generalized check-pointing and rollback – Analysis and synthesis tool support for the above
Recommend
More recommend