Synthesis Challenges for Next- Synthesis Challenges for Next- Generation High-Performance and Generation High-Performance and High-Density PLDs High-Density PLDs Jason Cong Songjie Xu Jason Cong Songjie Xu Department of Computer Science Aplus Design Technologies, Inc. Department of Computer Science Aplus Design Technologies, Inc. University of California, Los Angeles, USA University of California, Los Angeles, USA Los Angeles, USA Los Angeles, USA
Outline Outline N Introduction N Introduction N Introduction N Synthesis Challenges for New Architectures N Synthesis Challenges for High Density and High Performance N Concluding Remarks Slide 2
PLD Industry Growth PLD Industry Growth N Enjoyed the exponential growth as the rest of the semiconductor industry N With an even faster rate 36.07% 40.00% Annual Growth Rate (1994-1998) 27.78% 35.00% 24.50% 30.00% 25.00% 15.71% 20.00% 15.00% 10.00% 5.00% 0.00% Company/Industry Semiconductor Industry Altera Intel LSI Logic I nt roduct ion Slide 3
Definitions Definitions N PLD (Programmable Logic Device) T CPLD (Complex PLD) T Extensions of early PAL T Consist of PLA-like blocks T Macrocell T FPGA (Field Programmable Gate Array) T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB) I nt roduct ion Slide 4
CPLD CPLD N Example: Altera MAX 7000 I nt roduct ion Slide 5
Macrocell Macrocell N Example: Altera MAX 7000 T Each macrocell has a logic array, a product-term select matrix, and a programmable register I nt roduct ion Slide 6
Definitions Definitions N PLD (Programmable Logic Device) T CPLD (Complex PLD) T Extensions of early PAL T Consist of PLA-like blocks T Macrocell T FPGA (Field Programmable Gate Array) T Typically based on look-up tables (LUTs) T Multiple LUTs form a programmable logic block (PLB) I nt roduct ion Slide 7
FPGA FPGA N Example: Xilinx XC 4000 I nt roduct ion Slide 8
PLB PLB N Xilinx XC 4000 T Each PLB has two 4-LUTs, one 3-LUT and 2 FFs I nt roduct ion Slide 9
Advance of PLD Architectures Advance of PLD Architectures 1980’s 1998/1999 1980’s 1998/1999 MAX 5000: APEX 20K: Altera MAX 5000: APEX 20K: Altera 32-192 P-terms 51,840 Logic elements (LUTs) 32-192 P-terms 51,840 Logic elements (LUTs) 600-3,750 usable 442,368 RAM bits 600-3,750 usable 442,368 RAM bits gates 3,456 P-term macrocells gates 3,456 P-term macrocells 60,000-1.5M usable gates 60,000-1.5M usable gates XC 2000: Virtex: Xilinx XC 2000: Virtex: Xilinx 64-100 LUTs 58K-4M system gates 64-100 LUTs 58K-4M system gates 1,200-1,800 logic 1Mb distributed RAM 1,200-1,800 logic 1Mb distributed RAM gates 832Kb embedded memory gates 832Kb embedded memory I nt roduct ion Slide 10
PLD Synthesis Tends to Fall Behind ... PLD Synthesis Tends to Fall Behind ... N Additional features and capabilities in the new architecture often place new requirements for synthesis tools N Higher density and higher performance demand better scalability and more efficient optimization N Devil is always in the software … T Tool effort is often being underestimated T Quick customization from ASIC or existing PLD synthesis tool leads to considerably inferior results T Software is often the bottleneck of new PLD product release ... I nt roduct ion Slide 11
Challenges to PLD Synthesis Challenges to PLD Synthesis N Support for new PLD architectures T Hierarchical architectures T Heterogeneous architectures N Support for high-performance and high- density PLD designs T Layout-driven synthesis T Incremental synthesis T IP-based synthesis I nt roduct ion Slide 12
Outline Outline N Introduction N Synthesis Challenges for New N Synthesis Challenges for New N Synthesis Challenges for New Architectures Architectures Architectures N Synthesis Challenges for High Density and High Performance N Concluding Remarks Slide 13
PLD Architecture Development PLD Architecture Development N Two important trends T Hierarchical architectures T Heterogeneous architectures N Synthesis needs Synt hesis Challenges f or New Archit ect ures Slide 14
PLD Architecture Development Trend …… PLD Architecture Development Trend …… Hierarchical Architectures Hierarchical Architectures N Basic Idea T Group of basic logic blocks into clusters T Fast local programmable interconnects inside clusters T May have multiple levels of hierarchy N Benefits T Exploit the inherent locality of interconnections in most applications T Lead to the improvement in both performance and density Synt hesis Challenges f or New Archit ect ures Slide 15
Example Hierarchical Architectures Example Hierarchical Architectures N Altera FLEX 10K T Each LAB has 8 LEs T Each LE has a 4-LUT and a programmable register Synt hesis Challenges f or New Archit ect ures Slide 16
Two Types of Clusters Two Types of Clusters N Hard-wired connection based cluster (HCC) T Intra-cluster connection is formed by hard wires T e.g. CLB in XC4000 N Programmable interconnection based cluster (PIC) T Intra-cluster connection is formed by a local programmable interconnection array T e.g. LAB in FLEX 10K and APEX 20K Synt hesis Challenges f or New Archit ect ures Slide 17
Existing Synthesis Results for HCC Existing Synthesis Results for HCC N Traditional approach T Map into LUTs and then combine the LUTs to form HCCs in a heuristic post-processing step N Recent advance [Cong & Hwang, FPGA’97] T Use Boolean matching techniques to completely characterize the set of functions that can be implemented in a HCC T Map a netlist directly into HCCs Synt hesis Challenges f or New Archit ect ures Slide 18
Hard-Wired Connection Based Clusters (HCCs) Hard-Wired Connection Based Clusters (HCCs) N Example: Xilinx XC 4000 CLB T Each CLB has two 4-LUTs connected to a 3-LUT Synt hesis Challenges f or New Archit ect ures Slide 19
Example: Boolean Matching for XC4K CLB Example: Boolean Matching for XC4K CLB N Characterization based on functional decomposition XC4K CLB T f (X) = H ( F (X1) , G (X2) ), T f(X) = H ( F (X1) , G (X2) , x ), F T f(X) = H (F(X1,x), G(X2), x ), T f(X) = H (F(X1,x), G(X2,x), x ). x H N Conditions f(X) T F and G input sizes ≤ 4 G N Result: matched all “difficult examples” (over 1,700) from Xilinx T Best known tool produced only about 70% match Synt hesis Challenges f or New Archit ect ures Slide 20
Example: Mapping to XC4K CLB Example: Mapping to XC4K CLB J Given a function f(0,1,2,3,4,5) where a = 1’ + 3, b = 1 + 3 f = 0’245b’ + 0’245’b + 0’145b + 012’5’a + 0’2’4’5a + 025b + 0’2’5’a’ + 045a’ + 05’b’ J How many XC4K CLBs are needed to implement f(0,1,2,3,4,5) ? Synt hesis Challenges f or New Archit ect ures Slide 21
Example: Mapping to XC4K CLB (Cont’d) Example: Mapping to XC4K CLB (Cont’d) Mapping Packing #CLBs #Levels Chortle-crf simple 9 4 FlowMap simple 8 3 FlowMap functional 6 3 Boolean 1 1 3 1 F The Boolean 2 0 matching result 5 H G 4 Synt hesis Challenges f or New Archit ect ures Slide 22
Programmable Interconnection Based Programmable Interconnection Based Cluster (PIC) Cluster (PIC) N Example: Altera APEX 20K T Each LAB has 10 LEs (LUT + FF) connected through a fully programmable matrix Synt hesis Challenges f or New Archit ect ures Slide 23
Existing Synthesis Results for PIC Existing Synthesis Results for PIC N Common approaches T Map into basic logic blocks and then group the them into clusters under size and pin constraints T Recent progress on circuit clustering T Performance driven clustering for combinational circuits [Lawler’69] [Yang & Wong, T-CAD’97] T Simultaneous clustering with retiming for sequential circuits [Pan, et al, T-CAD’98][Cong, et al, DAC’99] Synt hesis Challenges f or New Archit ect ures Slide 24
Benefits of Considering Retiming Benefits of Considering Retiming during Clustering during Clustering N Proper clustering allows retiming to hide inter-cluster delays (E.g., assume gate_delay = 1, inter_cluster_delay = 2) Clustering B Clustering A same cutsize Φ =8 Φ =8 retiming retiming reduces delay cannot help Φ =6 Φ =8 Slide 25
Major Challenge in Synthesis for Major Challenge in Synthesis for Hierarchical Architectures Hierarchical Architectures N Can we synthesize a design directly into a multi-level hierarchical architecture? T Most existing PLD synthesis algorithms transform a given design into a flat netlist of basic PLBs and then go through a separate clustering/partitioning step. T Very few consider synthesizing directly for hierarchical architectures Synt hesis Challenges f or New Archit ect ures Slide 26
PLD Architecture Development Trend …… PLD Architecture Development Trend …… Heterogeneous Architectures Heterogeneous Architectures N Three types of heterogeneous architectures T Type 1: Multiple sizes and/or configurations of the same type of logic blocks T e.g. ORCA 2C, VF1, XC4000 T Type 2: Multiple types of logic blocks T LUTs, macrocells, and MUXes T e.g. APEX 20K T Type 3: Different kinds of resources on the same chip T Programmable logic blocks T Embedded memory blocks (EMBs) T Embedded processors Synt hesis Challenges f or New Archit ect ures Slide 27
Type 1 Heterogeneous Architectures Type 1 Heterogeneous Architectures N Example: Xilinx XC 4000 T Each CLB can implement two 4-LUTs or one 5-LUT Synt hesis Challenges f or New Archit ect ures Slide 28
Recommend
More recommend