niagara t1
play

Niagara(T1) A CMT PROCESSOR Rao Shoaib Solaris Core Technology - PowerPoint PPT Presentation

Niagara(T1) A CMT PROCESSOR Rao Shoaib Solaris Core Technology group rao.shoaib@sun.com Agenda: Why CMT Processors Highlights of Sun Niagara Processor Performance characteristics of T1 Need for Virtualization CMT &


  1. Niagara(T1) A CMT PROCESSOR Rao Shoaib Solaris Core Technology group rao.shoaib@sun.com

  2. Agenda: ● Why CMT Processors ● Highlights of Sun Niagara Processor ● Performance characteristics of T1 ● Need for Virtualization ● CMT & Virtualization ● Sun Virtualization Solutions ● HW and Software Network Virtualization. Sun Proprietary Information

  3. Case For CMT Processors Sun Proprietary Information

  4. Tradional processor behavior C M C M C M Thread Time Compute Compute Memory Latency Memory Latency Single scalar processor Time Saved C M C M C M Thread Time Compute Compute Memory Latency Memory Latency Processor optimized for ILP Sun Proprietary Information

  5. Characteristics of Commercial Work Load ● High degree of thread level parallelism (TLP) ● Large working sets result in poor locality of reference leading to high cache miss rates ● There is significant data sharing among threads resulting in coherence misses ● There is low instruction level parallelism (ILP) due to high cache miss rates, difficult to predict branches etc... ● Performance is bottle necked by stalls on memory access Sun Proprietary Information

  6. Sun Solution NIAGARA Chip Multi Threaded Processor Sun Proprietary Information

  7. Niagara(T1) ● Uses CPU threads to exploit TLP – Memory and Pipeline stall times are hidden due to multiple threads – Shared L2 cache allows efficient data sharing between threads ● Memory system is designed for high throughput – High bandwidth interface to L2 cache for L1 misses – Highly associative L2 cache – High bandwidth interface to DRAM Sun Proprietary Information

  8. Designed for Performance and Efficiency DDR-2 DDR-2 DDR-2 DDR-2 SDRAM SDRAM SDRAM SDRAM On-Chip Simplicity Dedicated Means No Integrated Wait Latency L2$ L2$ L2$ L2$ Memory Xbar FPU Controllers Clean Sheet C1 C2 C3 C4 C5 C6 C7 C8 Design Delivers Highest Integrated Performance, Internal Sys I/F Efficiency Communications Buffer Switch Core BUS Sun Proprietary Information

  9. Niagara Specs ● Up to 32 threads, 8 cores ● Unique L1$ 16KB-I, 8KB-D per core ● Shared L2$ 3MB, 134GB/s, 12 way associative ● Radically changed cache coherency processing ● 4XDDR2 Mem on CHIP Controllers 23GB/sec ● Upto 128 GB memory ● SSL support - 7X the RSA throughput of Xeon ● Requires about 70 Watts ● Each thread requires just about 2.0 watts ● No Recompilation required Sun Proprietary Information

  10. Thread Selection Policy ● CPU switches between available threads every cycle giving priority to least recently executed thread ● Threads become unavailable due to: – Long latency ops: loads, branch, mul, div – Pipeline stalls such as cache misses, traps, and resource conflicts ● Loads are speculated as cache hits, and the thread is switched in with lower priority. Sun Proprietary Information

  11. Multithreaded Process on Niagara Thread 4 Thread 3 Pipe7 Thread 2 Thread 1 Thread 4 Thread 3 Pipe6 Thread 2 Thread 1 Thread 4 Thread 3 Pipe5 Thread 2 Thread 1 Thread 4 Thread 3 Pipe4 Thread 2 Thread 1 Thread 4 Thread 3 Pipe3 Thread 2 Thread 1 Thread 4 Thread 3 Pipe2 Thread 2 Thread 1 Thread 4 Thread 3 Pipe1 Thread 2 Thread 1 Thread 4 Thread 3 Pipe0 Thread 2 Thread 1 Time Compute Memory Latency Larger number of Memory References outstanding from overlapping h/w threads leads to higher throughput Sun Proprietary Information

  12. SWaP (Space, Watts and Perf) Sun FireT2000 SWaP Rating = 30.4 Performance: 19,000 Users (1) = SWaP: 30.4 Space: 2RU x Watts: 312 Performance/(Space*Watts ) = SWaP Rating 1. LotusR6iNotes Sun Confidential: Sun Employees and Authorized Partners Only

  13. Sun Fire T1000 Crushes Xeon and p5+ Dell SC1425 IBM p5+ 520 T100 SPECjbb2005 SPECjbb2005 vs. Sun Fire 0 Performance 2.1X 1.6X Power Usage 1/2 1/2 Space Same 1/4 SWaP 4.4X 14X Sun Confidential: Sun Employees and Authorized Partners Only

  14. Niagara-2 (T2): True System on a Chip ● Better performance than Niagara-1 ● Up to 8 Cores ● Up to 64 threads per CPU ● Same power envelope as T1 ● On chip NIC's ● And much more that I can not state Sun Proprietary Information

  15. Performance Characteristics of T1 Sun Proprietary Information

  16. Positive Characteristics ● If a strand is stalled, its cycles can be utilized by other threads ● Multiple threads running the same application benefit by sharing text and data in L2 cache ● These characteristics make CMT ideal for throughput computing. Sun Proprietary Information

  17. Not so Positive Characteristics ● If one thread is thrashing the L1 instruction cache, data cache, or TLB's on a core, it can adversely affect other threads on that core. ● If all threads run on the same core they are only getting one-quarter of the CPU time. ● So CMT is not ideal for real time applications. Sun Proprietary Information

  18. Scaling issues to be aware of ● Hot locks are the most common reason applications fail to scale on CMT processors ● Tuning Critical Sections ● Apply more threads as CMT is a thread rich environment. Sun Proprietary Information

  19. Server Virtualization Sun Proprietary Information

  20. Benefits of Virtualization ● Virtualization is masking and sharing of server resources ● Results in  Server Consolidation  Higher server utilization  Increased operational efficiency  Improved manageability Sun Proprietary Information

  21. CMT and Virtualization ● CMT provides hooks for server virtualization ● Each Strand can be a Virtual CPU ● Niagara-2 also provides support for Network Virtualization Sun Proprietary Information

  22. Solaris Virtualization Solutions ● Containers (BSD Jails) ● Logical Domains (Individual OS Instance per domain) ● Xen Sun Proprietary Information

  23. Logical Domains + Zones • Partitioning capability LDom 1 LDom 2 LDom 3 > Create virtual machines each Solaris 10 Solaris 10 Solaris 11 App with sub-set of App resources App App App App App > Protection & App Zone 2 Isolation using Zone 1 Zone HW+firmware Hypervisor combination Hardware CPU CPU CPU CPU Shared CPU, Mem Mem I/O Mem Memory, IO Sun Confidential: Sun Employees and Authorized Partners Only

  24. Network Virtualization Sun Confidential: Sun Employees and Authorized Partners Only

  25. HW Based Network Virtualizarion ● Niagara-2 (T2) has on chip network interfaces ● Supports network virtualization/partitioning – Multiple Partitions can co-exist within a port – Only cable, MAC and RX FIFO's are shared. ● Virualization/Partitioning can be Based on – VLANS – upto 4K per port – MAC address – upto 16 per port – Service addresses (IP addresses, TCP/UDP ports) - upto 256 per device ● Interrupts for flow are sent to a particular CPU ● Full register sets are provided to control RX Rings Sun Proprietary Information

  26. NIU RX Classification Model Incoming flows are classified at layer 2, 3, or 4 and put into RX DMA channel according to classification rules that matched the flow. RX RX RX RX RX RX ... DMA DMA DMA DMA DMA DMA NIU Flow Classifier NIU Flow Classifier Solaris Classification Interface: m_l2_classify_add() m_l2_classify_remove() MAC m_classify_add() m_classify_remove() Incoming Traffic Sun Confidential: Sun Employees and Authorized Partners Only

  27. Software Based Network Virtualization ● Not All NIC's have HW support for Virtualization ● Software creates virtual stacks over 1Gb and 10Gb NIC's ● Virtual stacks are isolated from each other (for both resources and security purposes) ● Each Virtual stack can be tuned separately Sun Proprietary Information

  28. Virtualized Networking Global Zone 2 Zone 1 Zone Specific Global Zone 1 Zone 2 To Zone Squeue Squeue Squeue Containers Exclusive Shared Shared Network Network Network Stack with Stack Stack Global Zone Stack Virtual Virtual .. . Virtual NIC NIC NIC Common To All Global Zone Zone 1 Zone n Virtual .. . Mem area Mem area Mem area Machines Flow Classifier NIC Sun Proprietary Information

  29. Virtual Network with XEN Solaris Guest OS 2 Solaris Host OS Solaris Guest OS 1 NIC Virtualization NIC NIC Engine Virtualization Virtualization Engine Engine Guest OS 2 Guest 1 Virtual SQUEUE Virtual SQUEUE Host OS All Traffic Virtual SQUEUE HTTP HTTPS Default .. . All Traffic Squeue Squeue Squeue Guest OS 2 VNIC Virtual Virtual Virtual Host OS VNIC NIC NIC NIC HOST OS Guest OS 2 Guest OS 1 Guest OS 1 Guest OS 1 .. . . . All traffic All Traffic HTTP HTTPS Default .. . .. . . Mem area Mem area Mem area Mem area Mem area Flow Classifier NIC Sun Proprietary Information

  30. Future Work ● More work is needed to characterize different workloads on CMT processors and define best practices ● Open Interfaces are needed to implement Virtualization ● Network Bandwidth/Resource control support is needed in HW Sun Proprietary Information

  31. References ● Various Sun internal and external documents and publications on Niagara Sun Proprietary Information

  32. Niagara(T1) A CMT PROCESSOR Rao Shoaib Solaris Core Technology group rao.shoaib@sun.com

Recommend


More recommend