challenging the intel xeon arm and openpower
play

Challenging the Intel Xeon: ARM and OpenPower Now you really have - PowerPoint PPT Presentation

Challenging the Intel Xeon: ARM and OpenPower Now you really have to optimize Mighty Intel Intel had a 99.2 percent market share in server chips (IDC, 2015 Quoted on InfoWorld) We started experimenting with SoCs two


  1. Challenging the Intel Xeon: ARM and OpenPower Now you really have to optimize

  2. Mighty Intel … • “Intel had a 99.2 percent market share in server chips” (IDC, 2015 – Quoted on InfoWorld) • “We started experimenting with SoCs two years ago. … didn't work well because the single-thread performance was too low, resulting in higher latency for our web platform” – Facebook Engineering

  3. …sits solid on the Throne • Best & most mature process technology in the world – 14 nm finfet trigate (2014) • Power management the competition can only dream off • Richest software ecosystem

  4. Sizing Servers • Established in 2006 at Howest*, funded by Flemish gov since 2007 • 4 – 6 FTE (2007-2016) • 2 – 3 trainees • Specialized in independent performance optimization research • Howest = Technical University in West- Flanders (Kortrijk, Belgium)

  5. March 2016 IWT VIS TR 135096

  6. March 2012 • Java performance – + 60% for Xeon E5 v1 – +19% for Xeon E5 v4 • OLTP – + 51% for Xeon E5 v1 – +19% for Xeon E5 v4

  7. Recognize this one? • Moore’s law • “ were shrinking so fast that every year twice as many could fit onto a chip. • 1975 “adjusted the pace to a doubling every two years”

  8. There is Moore • CPU processing power per dollar • DRAM & NAND: price per megabit – a 35% per year reduction in price • Also drives the Cloud / Internet • “Google will do anything to beat Moore’s law ”

  9. MOORE'S LAW IS “SILICON VALLEY'S BEATING HEART””

  10. The Thermal Wall: 2004

  11. A few examples today Power Min die Density Product line Cores Clock Year Name Process size Power Historical ref points 103 Pentium 4 1 3,8 2004 "Prescott" 65nm 112 115 27 Pentium 3 1 1 1999 "Coppermine"180 nm 106 29 Today 75 Core i7-6xxx 4 4 2016 "Sky Lake" 14 nm 122 91 57 Xeon E5 8 3,4 2016 "Broadwell" 14 nm 246 140 50 Core i7 4xxx 4 4 2014 "Hasswell" 22 nm 177 88 GPUs 58 GeForce 1000 3584 1,6 2014 "Pascal" 16 nm 520 300 44 GeForce 800 2880 0,9 2016 "Kepler" 28 nm 571 250

  12. A bumpy road • 90 nm (2004), strained Silicon (35% faster switching) • 45 nm (2008) “high -k dieelectric ” – reduced leakage • 22 nm (2012) “ Trigate ” ( reduce both swithing and leakage power) – Research started in 2002!! • THE WALL: photolithography process light with a 193 nanometre wavelength – EUV (13,5 nm)

  13. 2013 • Still optimistic • Intel, AMD, TSMC, GlobalFoundries , and IBM => • “Moore’s Law Roadmap”

  14. 2016 • 10 nm Postponed to late 2017 • 7 nm: Big Question mark! • NO more Silicon, but Indium Gallium Arsenide (InGaAs) at 7 nm • Nanotubes? Graphene?

  15. • 4% loss per generation!

  16. Problem: big data gets brains • Data gets too complex for humans to analyze

  17. And Now? • Field Programmable Gate Array (FPGA) • ASICs (App Specific IC) • Graphical Processing Unit (GPU) • MIC (Many Integrated Cores) IWT VIS TR 135096

  18. IWT VIS TR 135096

  19. The market has changed too EVOLVING MARKET, NEW PLAYERS

  20. Total Market: something has changed

  21. Cavium Thunder-X • First 64-bit ARM server vs “ mid range” Xeon E5 • 48 “ simple 2 IPC” cores @ 2 GHz @ 120W – Single thread perf is 3-5x lower • 28 nm technology • Gigabyte servers

  22. Software ecosystem • No Java Native Access Libraries • Spark crashes with machine language message • MySQL, LAMP , most Java applications work

  23. Performance / watt

  24. Conclusion ARMv8 (64) • Niche oriented Cavium Thunder-X • Future chips of Qualcomm, Cavium (MaybeAvago Broadcomm) • AMD & AppliedMicro not competitive (yet??) • A few big customers: – Paypal (VPN, firewall, some webservices) – Already conquering the Chinese market (HiSilicon, HuaWei) • Fragmented market • Still unmature ecosystem: – JNA & ElasticSearch, Spark

  25. OPENPOWER

  26. POWER8 disadvantages • Very power hungry: 10 cores @ 190 W TDP + Mem buffers (60-80W) vs 22 cores @ 145W Xeon • JNA not supported • Some software still a bit unoptimized (MySQL)

  27. When OpenPOWER makes sense • Based upon most complex core on the market (8 threads, 8 IPC, 3.5+ GHz) • (Some) Pricing competitive with HP/Dell • 32 DIMM slots per CPU (Intel: 12) • Open from firmware to Software • Google & Rackspaces have a new OpenPOWER server • Some software runs as fast as best Xeons (MongoDB, PostGreS) • Software ecosystem has grown fast …

  28. OpenPower Ecosystem

  29. IBM: first integrator of NVLink

  30. “ Deep Learning” P100

  31. Page Migration Engine & POWER8 with NVLink Barriers to Entry Removed • Far easier to create new applications on Tesla P100 • NVIDIA Page Migration Engine ensures unified Too Large a memory space Memory Too complicated to move data Space Required • Unified memory: address space spans CPU and GPU, 1TB+ • Hardware managed transfers: eliminates explicit data transfers Too much • custom T esting program implementing these advantages Moves too coding for much data – POWER8 with NVLink ensures speedy data throughput GPU data movement • 1TB memory space requires faster CPU:GPU data movement • Bus masks transfer times Software UVM Requires page faulting – Close code-base to parallel CPU code support feature too limiting | 3 8

  32. Percona MySQL 5.7

  33. Few Large or many small nodes? SPARK TESTING

  34. Our test 300 GB GZIP “Common Crawl” Web archives Body tekst extract by “ BoilerPipe ” Natural Language Processing (Stanford) Aggregate: Group by & Sort entity counts Generate recommendations w Alternating Least Square IWT VIS TR 135096

  35. Realtime in-memory processing with Spark

  36. Spark Optimization • Number of virtual cores per executor (JVM): – 1 per 2 logical cores (Intel: 1, IBM: 4) • Number of executors = number of physical cores – 1 • spark.default.parallelism = +/- 1,5-2 tasks per executor • GCThreads= 1 per virtual core per executor • Speed up = 10-20%

  37. • 20% gain per generation

  38. Conclusions so far • Moore’s law is dead: opportunity for niche players • OpenPower has some tangible advantages • Next generation of ARM servers should be watched • New innovations … – Combining streaming, sensor data & static data – Deep learning • … will require much more tuning & specialized chips

  39. Rate My Session!

Recommend


More recommend