BabyBEE Defining the Silicon Circuit Board CASPER Workshop August 4, 2008 Bob Conn CTO
siXis • Spin-out of Research Triangle Institute • Started July 3, 2008 • $5.2M • MiniBee Alpha product already delivered • BabyBEE available early 2009 • Derivative of BEE2 2
BabyBEE • Started with BEE2 at BWRC • BabyBEE • Smaller • Faster • Cooler • Less expensive • Lower operating costs • Xilinx V5 • Futures with CPUs, other FPGAs, RF, etc. 3
SiCB vs. Multi-Chip Module vs. PCB SICB MCM* PCB 2007* Board-to-board interconnect density 1600% 625% 100% Layers for interconnect to exit BGA 25% 63% 100% Pad pitch 24% 40% 100% Trace pitch (width plus space) 16% 32% 100% Via diameter (through substrate) 60% 90% 100% Microvia diameter (not through substrate) 13% 100% 100% Maximum substrate size 10% 20% 100% Functionality per unit area 10x 10x 1 Cost per unit functionality 41% 54% 100% Reliability Better Worse Average 4 *The Information Network, Internal RTI analysis
MiniBEE • FR-4 based version of BabyBEE • Architectural verification • Early adopter software development platform 5” x 10” 22 layer FR-4 V5LX220 5
BabyBEE FPGA die Cooling Power SiCB 6
32 FPGAs, 256 Memory Chips BabyBEE = SiCB technology validating application High performance reconfigurable computing Excellent application of the SiCB platform • Designs are scalable • FPGAs available as known (mostly) good die today • Rapidly developing market • High value placed on low power and small size 2” x 3 ½” x 3 ½” 7
BabyBEE Architecture Compute A Typical Small System Board I/O Board 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory A C A C A C A C I/O I/O Ethernet Ethernet V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 X W A A A A A V5LX110T V5LX110T CX4 CX4 C C C C C 16 channels 16 channels CX4 CX4 B B B B B 16 channels Y Z 16 channels D D D D D V5LX110T V5LX110T B D B D B D B D Ethernet Ethernet V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 V5LX220 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB 1GB Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Shared Bus (black) Local Bus Torus Bus (red) 8
BabyBEE Advantages • Lower parasitic capacitances (and inductance) • Less need for bypassing • Vertical interconnect density 12:1 • Horizontal wire density 12:1 • Wires are RC, not LC • Termination resistors almost eliminated • Easier to design • FR4 already requires a microwave engineer • Increased I/O density on FPGAs because of smaller I/O drivers • Limited by package constraints today • I/O buffers can be smaller 6pF to 3pF • New chip designs • Alignment • Approximately 1mil for FR4 to 1 micron for SICB • Stacking memory – the wiring problem is minimum 9
Desktop Brick • 4” x 6” x 8” • 0.35 TFlops (dp) • 30 TOps (16 bit) • 16 FPGAs • 16 GB memory • 50Gb/s I/O • < $100k 10
The Cube Just for Fun 884mm. 37 petaOps (16 bit int) I/O latency = 12ns 8x 8x 8x 8x 8x 8x 8x 8x BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE Power distribution (copper) I/O latency = 12ns 8x 8x 8x 8x 8x 8x 8x 8x BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE The Cube Power distribution (copper) 27" x 34" x28" I/O latency = 12ns 8x 8x 8x 8x 8x 8x 8x 8x 8x8x8 BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE < $100M Power distribution (copper) I/O latency = 12ns 4096 BabyBEE Boards 8x 8x 8x 8x 8x 8x 8x 8x BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE 16,384 FPGAs Power distribution (copper) 720mm. 18 TB memory I/O latency = 12ns 8x 8x 8x 8x 8x 8x 8x 8x 500kW BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE Power distribution (copper) 360 Tflops (dp) I/O latency = 12ns 37 petaOps 8x 8x 8x 8x 8x 8x 8x 8x BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE #1 Supercomputer Power distribution (copper) I/O latency = 12ns Maximum latency across 8x 8x 8x 8x 8x 8x 8x 8x The Cube < 20ns BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE Power distribution (copper) I/O latency = 12ns 8x 8x 8x 8x 8x 8x 8x 8x BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE BabyBEE Power distribution (copper) 11
BabyBEE Products PCB Package Brick Cube Size mm 80 x 100 x 10 100 x 100 x 50 1000x1000x1000 FPGAs 4 32 16,384 Memory 4GB 32GB 16 TB Power 150W +12v 1200W +48v 500kW +48v GP I/O 560 0 0 Serial I/O 0 40-80 channels more if needed 12
• Size is connector limited • I/O: Expect 5Gb/s per channel 6 BabyBEE I/O Modules • 13
BEE2 Through BabyBEE BabyBEE 14
Core Technology and Intellectual Property Large-area SICBs attempted and failed 15 years ago • Reliability an issue > 1”x1” • Delamination/cracking occurred under routine thermal cycling • Inadequate interconnect density • Known good die issue BEECo’s technology allows reliable 4”x5” SICBs (min.) High aspect ratio vias COPPER • Higher interconnect density, fewer signal layers 56 55 LOW-K #1 • High aspect ratio through-silicon-vias technology 12 µ 10 µ 52 SiO 2 #1 • Mechanical re-enforcement (rivets) SILICON 51 SUBSTRATE • Stress-relieving structures (spongy oxide and service loops) Spongy oxide SiO 2 Eight patents filed, 16 disclosures 311 312 LOW-K #2 310 COPPER 302 LOW-K #1 SiO 2 15 301 SILICON SUBSTRATE Service loops
BabyBEE vs. MiniBEE 100mm 10mm A B 31.42 21.24 FR4 PCB 215mm MiniBEE 31.42 21.24 A B 0.7 TFlops (dp) 60TOps (16bit) SICB 16 50mm 300 Gb/s I/O BabyBEE 8GB memory 83mm 800W
MiniBEE I/O Board 10/100 RJ-45 DBG HDR USB 2.0 LEDS Ethernet CLOCK CY7C6801 PHY 25 MHz 3A 64M x 32 64M x 32 64M x 32 64M x 32 DDR2 DDR2 DDR2 DDR2 JTAG MEM1B MEM2B MEM1A MEM2A CFG/STAT A 38 38 CFG/STAT B 74 74 74 74 B20,24 B12,18 JTAG 14 74 B15,19 A2B 74 Share B Share A B6,8 74 74 Torus B 74 Torus A B23,27 B19 B23 B11 JTAG A2C B2C A C B B5 36 B5 B6 20 CFG/STAT I/O A CFG/STAT I/O B CX4 CX4 B1,2 38 B12 B17 38 80 V5LX110 80 V5FX130T V5FX130T x5 x5 6 800 I/O 840 I/O 840 I/O 8 2 B38 B21,25 B18,26 B13 74 Share D Share C 74 B17,21 Unused B7,24 74 Torus D Torus C 74 B25,29 72 72 Clocks &LEDs 2 42 BOOT RTC SPI B4 PROM MEM2C MEM1C B11,13 38 CFG/STAT C 38 CFG/STAT D 73 73 64M x 32 64M x 32 CLOCKS MEM3A MEM3B 125, 156 200, 333 DDR2 DDR2 MHz FAN CNTRL RLDRAM RLDRAM 64M x 32 8M x 36 Level Shift 17 SYSACE RS232
Recommend
More recommend