Cut Power Consumption by 5x Without Losing Performance A big.LITTLE Software Strategy Klaas van Gend FAE, Trainer & Consultant
The mandatory Klaas-in-a-Plane picture LINUXCON EUROPE 2014 2 | October 10, 2014
Quad Core vs. Dual Core – Why isn’t it Twice as Fast? VS LINUXCON EUROPE 2014 3 | October 10, 2014
The GHz race LINUXCON EUROPE 2014 4 | October 10, 2014
Why GHz++ cost power^2 LINUXCON EUROPE 2014 5 | October 10, 2014
ARM big.LITTLE “OK, heavy work costs power. Let’s not waste power on light work…”
ARM playing it cool: big.LITTLE Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/ LINUXCON EUROPE 2014 7 | October 10, 2014
A7 vs A15 Cortex A7: • Less silicon area • Less optimal cycles • Less cycles/second • More power efficient LINUXCON EUROPE 2014 8 | October 10, 2014
How to use big.LITTLE today Source: http://community.arm.com/groups/processors/blog/2013/06/18/ten-things-to-know-about-biglittle/ LINUXCON EUROPE 2014 9 | October 10, 2014
Some available big.LITTLE hardware • AllWinner A80 • Renesas automotive silicon • Samsung Galaxy S4 for South-Korean market • Samsung Galaxy S5 for South-Korean market Exynos5 • Hardkernel ODROID-XU board Built-in Power Measurement LINUXCON EUROPE 2014 10 | October 10, 2014
Use Case: Chromium
Chrome / Chromium / ChromeShell • Chromium : open source browser based on KHTML Webkit Blink • Google Chrome : closed-source browser based on Chromium • ChromeShell : open source Chromium “browser” for Android • Chrome for Android :closed-source browser for Android LINUXCON EUROPE 2014 12 | October 10, 2014
Chromium workload Visualized Loading Parsing Layouting/Rendering Painting JavaScript Canvas LINUXCON EUROPE 2014 13 | October 10, 2014
HTML5 Canvas “graphics device for JavaScript” LINUXCON EUROPE 2014 14 | October 10, 2014
HTML5 Canvas “graphics device for JavaScript” LINUXCON EUROPE 2014 15 | October 10, 2014
Parallelizing Canvas “not as easy as it looks” LINUXCON EUROPE 2014 16 | October 10, 2014
Canvas Parallelization - Performance Results Benchmark Standard Parallelized Performance on quad-core Blink Blink improvement Flashcanvas perf 1,69 score 2,44 score 44% Fc perf w/ alpha 1,04 score 1,52 score 50% Guimark2 Vector 9,5 fps 13,3 fps 40% Canvasmark ‘13 3475 score 4116 score 53% Average improvement 47% With parallelism you can improve performance of even the most complex applications! LINUXCON EUROPE 2014 17 | October 10, 2014
Google Chrome LINUXCON EUROPE 2014 18 | October 10, 2014
Google Chrome on Odroid-XU+E Using Google’s Chome (version 33 for Android) • 2 cores active: 54% and 84% • Power use A15+A7 cores: 2.374 Watts • Test average: 9.44 fps LINUXCON EUROPE 2014 19 | October 10, 2014
Our ChromeShell on Odroid-XU+E Using our optimized ChromeShell: • 3 A15 cores active: 59%, 63% and 38% • Power use A15+A7 cores: 3.116 Watts • Test average: around 14 fps LINUXCON EUROPE 2014 20 | October 10, 2014
Canvas Parallelization - works even on ‘normal’ silicon like Qualcomm Snapdragon 800 “Our” ChromeShell: Default Chrome Average: 7.12 fps Average: 14.48 fps • LG’s NEXUS 5 phone • Quad core Qualcomm Snapdragon 800 • Phone heating up similarly in both cases LINUXCON EUROPE 2014 21 | October 10, 2014
Canvas Parallelization - Power Consumption on “ Flashcanvas perf ” Benchmark Standard Blink Parallelized Blink Difference on A15+GPU on quad-A7 No optimization 29 fps 17 fps -40% Performance 29 fps 26 fps -10% Power 2,2W 0,4W 550% consumption Performance / 1,3 65 490% Watt With parallelism and right chip choices • you can get 5x power savings • without losing performance! LINUXCON EUROPE 2014 22 | October 10, 2014
Comparing performance / watt: Using Google’s Chome (version 33 Android) • 2 cores active: 54% and 84% • Power use A15+A7 cores: 2.374 Watts • Test average: 9.44 fps Using our optimized ChromeShell: • 3 A7 cores active: 73%, 80% and 44% • Power use A15+A7 cores: 0.472 Watts • Test average: 10.04 fps LINUXCON EUROPE 2014 23 | October 10, 2014
1x A15 or 4x A7? LINUXCON EUROPE 2014 24 | October 10, 2014
1x A15 < 4x A7 ! Less W 20000 MIPS More than twice the performance LINUXCON EUROPE 2014 25 | October 10, 2014
Back to big.LITTLE Making these results work outside a lab
State of big.LITTLE in Linux - 1 What’s in the kernel today? LINUXCON EUROPE 2014 27 | October 10, 2014
State of big.LITTLE in Linux - 2 What else is relevant? • IKS (In-kernel-Switcher) – Firstly available in Linaro kernel trees – Merged in 3.11 kernel • Qualcomm / LG / etc powerdaemons – Throttle performance if cores overheat – Usually “secret” • Not-in-mainline Schedulers: – Linaro’s GTS (Global Task Scheduler), – a.k.a. HMP (Heterogeneous Multi-Processing) • Kernel Summit 2014 “Energy -Aware Scheduling Workship ” LINUXCON EUROPE 2014 28 | October 10, 2014
Feedback loop Setpoint • We know when we want to have 4xA7 or 1xA15 • If we can tell the kernel, it can anticipate – instead of noticing an increase in workload – and by accident turning on the A15s LINUXCON EUROPE 2014 29 | October 10, 2014
Where to go? • Qualcomm MARE – Research project “Feedback loop” – Framework to aid parallelization – Should assist kernel in scheduling/cpufreq (in user space) • Deadline scheduler – Merged in Linux 3.14 – Application sets SCHED_DEADLINE – Application sets scheduling attributes “Feedback loop” – Task repetition in microseconds (in kernel space) – Task start within repetition – Task completion deadline within repetition LINUXCON EUROPE 2014 30 | October 10, 2014
Is parallelism going to stay? Actually, is big.LITTLE going to stay??? • The GHz race has come to an end – Now also for ARM • The speed of light limits “clock domain size” • Thus many clock islands on a die – Multicore is just an “easy” way to improve performance – At the cost of the programmer – Who needs extra training • ARM big.LITTLE – Is a mechanism to skip heavy power consumption At the cost of more mm 2 silicon – – Is it worth it??? LINUXCON EUROPE 2014 31 | October 10, 2014
My ideal ARM-based design: big: 1x A57 LITTLE: 4x A53 Why is no-one designing this chip? LINUXCON EUROPE 2014 32 | October 10, 2014
Conclusions
Conclusion • big.LITTLE works • IFF – Short bursts can be handled by one ‘big’ core – Heavier workloads are parallelizable and run on clusters of LITTLEs – APIs become available: Programs must indicate what the workload will be BTW: Chromium is parallelizable – we did it. LINUXCON EUROPE 2014 34 | October 10, 2014
Conclusion • big.LITTLE works • IFF – Short bursts can be handled by one ‘big’ core – Heavier workloads are parallelizable and run on clusters of LITTLEs – APIs become available: Programs must indicate what the workload will be BTW: Chromium is parallelizable – we did it. LINUXCON EUROPE 2014 35 | October 10, 2014
Vector Fabrics – the Company • Founded February 2007 • Founding team – Strong in SoC design and multi-core software – Currently 15 FTE: 6 PhD, 7 MSc • Protected technology – 3 patents filed in US & Europe • Recognition – “Hot Startup” in EE Times Silicon 60, since 2011 – Selected by Gartner as “ Cool vendor in Embedded Systems & Software ” 2013 – Global Semiconductors Alliance award, March 2013 LINUXCON EUROPE 2014 36 | October 10, 2014
Contact Information • Web: www.vectorfabrics.com • Email: klaas@vectorfabrics.com • Tel: +31 40 8200960 • Address: Vector Fabrics B.V. Vonderweg 22 5616RM Eindhoven The Netherlands LINUXCON EUROPE 2014 37 | October 10, 2014
Thank You! (drop your business card if you want the slides and the to-be-released whitepaper) Klaas van Gend FAE, Trainer & Consultant klaas@vectorfabrics.com
Recommend
More recommend