Leveraging GPUs and Self-Tuning Systems on the Road to Exascale: a - PowerPoint PPT Presentation

Oct 27, 2022 •13 likes •73 views

Leveraging GPUs and Self-Tuning Systems on the Road to Exascale: a view from ARM Anton Lokhmotov, Media Processing Division Joint GPGPU-5 / EXADAPT-2 Round Table 3 March 2012, London 1 ARM A company licensing IP to all major semiconductor

Leveraging GPUs and Self-Tuning Systems on the Road to Exascale: a view from ARM Anton Lokhmotov, Media Processing Division Joint GPGPU-5 / EXADAPT-2 Round Table 3 March 2012, London 1
ARM  A company licensing IP to all major semiconductor companies (form of R&D outsourcing)  Established in 1990 (spin-out of Acorn Computers)  Headquartered in Cambridge, UK, with 27 offices in 13 countries and 2000+ employees  ARM is the most widely used 32-bit CPU architecture  Dates back to the mid 1980s (Acorn RISC Machine)  Dominant in mobile devices (e.g. on average 3 processors per phone)  Mali is the most widely licensed GPU architecture  Dates back to the early 2000s (developed by Falanx, Norway)  Media Processing Division established in 2006 (acquisition of Falanx)  Released products:  Mali-55 (OpenGL ES 1.1), Mali-200, Mali-400 (OpenGL ES 2.0)  Mali-T604 (OpenGL ES 2.0 + OpenCL 1.1) 2
Landscape of accelerator programming Interface CUDA OpenCL DirectCompute RenderScript Originator NVIDIA Khronos (Apple) Microsoft Google Year 2007 2008 2009 2011 Area HPC, desktop Desktop, mobile, Desktop Mobile embedded, HPC OS Windows, Linux, Windows, Linux, Windows (Vista+) Android (3.0+) Mac OS Mac OS (10.6+) Devices GPUs (NVIDIA) CPUs, GPUs, GPUs (NVIDIA, CPUs, GPUs, custom AMD) DSPs Work unit Kernel Kernel Compute shader Compute script Language CUDA C/C++ OpenCL C HLSL Script C Distributed Source, PTX Source Source, bytecode LLVM bitcode  Portability is likely to remain an issue despite standardisation efforts  Performance portability is perhaps even more of an issue! 3
Performance tuning  Why should we tune?  To improve selected performance metrics (e.g. energy efficiency)  What should we tune for?  Architecture & implementation  System behaviour (resource contention, failures)  Data (probability distribution of characteristic inputs)  When should we tune?  Compile-time, run-time, install-time, idle-time…  How should we tune?  By automatically synthesising software based on extracted features 4
Challenges  Understanding and exploring the search space (cf. EXADAPT)  Extracting and interpreting features  Using collective knowledge  Enabling synthesis of efficient software (cf. GPGPU)  Compilers have always had a limited ability to invent new algorithms  Algorithm representation must be clear and flexible enough to enable powerful transformations  What you generate may change over time (e.g. system and driver improvements)  Marrying the above (cf. GPGPU+EXADAPT) 5
Final note  Building exascale supercomputers is probably less important than making petascale embedded and mobile devices work together ! 6

Recommend

Automatically Tuning Performance and Power for GPUs Jeffrey K. Hollingsworth What is

Automatically Tuning Performance and Power for GPUs Jeffrey K. Hollingsworth What is Auto-tuning? Making programs better based on empirical feedback from observed runs without programmers having to in the loop. What to Tune

363 views • 13 slides

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Gunrock High-Performance Graph Analytics for the GPU Muhammad Osama University of California, Davis Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found everywhere Found everywhere Road &

560 views • 23 slides

Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena Barba Department of

Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena Barba Department of Mechanical Engineering, Boston University Saturday, June 4, 2011 Previous Calculations N=3x10 9 : 6 sec (Yokota & Barba) N=3x10 9 : 20 sec 40 TFlops

373 views • 13 slides

Guided Profiling for Auto-Tuning Array Layouts on GPUs Nicolas Weber, Sandra C. Amend and Michael

Guided Profiling for Auto-Tuning Array Layouts on GPUs Nicolas Weber, Sandra C. Amend and Michael Goesele TU Darmstadt Motivation Memory access is one of the most important performance factors in CUDA applications CUDA Programming Guide

644 views • 36 slides

LEVERAGING YOUR LIDAR: Rethinking Forest Road Design Matt tthew hew Dickie Softree ee

LEVERAGING YOUR LIDAR: Rethinking Forest Road Design Matt tthew hew Dickie Softree ee Technical hnical Systems ems In Introductions oductions Forest st Engineer ineering ing Softw tware are Matt t Dicki kie Roads, culverts,

609 views • 17 slides

Online Auto-Tuning Ray S. Chen Jeffrey K. Hollingsworth 1 Motivation HPC systems will

ANGEL: A Hierarchical Approach to Online Auto-Tuning Ray S. Chen Jeffrey K. Hollingsworth 1 Motivation HPC systems will require online auto-tuning Managing billion-way parallelism is non-trivial Cannot myopically focus on wall-time

159 views • 13 slides

Database Tuning Module 5, Lectures 6 and 7 Database Management Systems, R. Ramakrishnan 1

Database Tuning Module 5, Lectures 6 and 7 Database Management Systems, R. Ramakrishnan 1 Tuning the Conceptual Schema The choice of conceptual schema should be guided by the workload, in addition to redundancy issues: We may settle for

383 views • 19 slides

CAPES:Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement

CAPES:Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning Yan li, Kenneth Chang, Oceane Bel, Ethan L. Miller, Darrel D. E. Long Performance Tuning Tuning systems parameters for high

343 views • 15 slides

TUNING Russia: Development of master programmes in engineering education using the Tuning

TUNING Russia: Development of master programmes in engineering education using the Tuning approach Iacint Manoliu , ENAEE, TUNING expert and advisor Elena Silina , Moscow State University of Railway Engineering, tbc Ovcharov Sergey, North

701 views • 44 slides

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel efficiency Com pany Profile Viezu specialize in electronic engine tuning, rolling roads, diagnostic and test equipment Dynocom EU launched 2010;

281 views • 15 slides

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Lecture VI:

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Lecture VI: Performance Tuning and Benchmarking in Databases Performance Tuning Performance tuning involves adjusting various parameters and design choices

635 views • 40 slides

Efficiency of general Krylov methods on GPUs An experimental study H. Anzt, M. Kreutzer, M.

6 th AsHES workshop May 26 th , 2016, Chicago, USA Efficiency of general Krylov methods on GPUs An experimental study H. Anzt, M. Kreutzer, M. Koehler, G. Wellein, J. Dongarra Piotr Luszczek Solving large sparse linear systems on GPUs

390 views • 13 slides

Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernab

International Conference on Computational Science (ICCS 2013) Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs G. Bernab , J. Cuenca and D. Gimnez Computer Engineering Department, University

711 views • 44 slides

Tuning PI controllers in non-linear uncertain closed-loop systems with interval analysis J.

Tuning PI controllers in non-linear uncertain closed-loop systems with interval analysis J. Alexandre dit Sandretto, A. Chapoutot and O. Mullier U2IS, ENSTA ParisTech SYNCOP April 11, 2015 Closed-loop control systems r ( t ) e ( t ) u ( t )

284 views • 17 slides

Tuning SMT Systems on the Training Set Chris Dyer, Patrick Simianer, Stefan Riezler, Phil Blunsom,

ToTS Dyer, Simianer, Riezler, Blunsom, Hasler Tuning SMT Systems on the Training Set Chris Dyer, Patrick Simianer, Stefan Riezler, Phil Blunsom, Eva Hasler Project Report MT Marathon 2011 FBK Trento Tuning SMT Systems on the Training Set

504 views • 49 slides

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs

Scott Le Grand Some Things Never Change (GPUs vs the World) How Best to Exploit GPUs Molecular Dynamics or Matrix Factorization? Determinism and Numerical Stability Dynamic Range for both MD and NNs Latest AMBER PME Numbers

1.18k views • 90 slides

Optimizing and Tuning the Fast Multipole Method for Multicore and Accelerator Systems Georgia

Optimizing and Tuning the Fast Multipole Method for Multicore and Accelerator Systems Georgia Tech Aparna Chandramowlishwaran, Aashay Shringarpure, Ilya Lashuk; George Biros, Richard Vuduc Lawrence Berkeley National Laboratory Sam

898 views • 48 slides

Feeding of the Thousands Leveraging the GPU's Computing Power for Sparse Linear Algebra

SPPEXA Annual Meeting 2016, January 25 th , 2016, Garching, Germany Feeding of the Thousands Leveraging the GPU's Computing Power for Sparse Linear Algebra Hartwig Anzt Sparse Linear Algebra on GPUs In Inherently parallel operations

336 views • 21 slides

Performance Tuning and Debugging Don Porter CSE/ISE 311:

CSE/ISE 311: Systems Administra5on Performance Tuning and Debugging Don Porter CSE/ISE 311: Systems Administra5on Why is my applica3on slow? No silver

606 views • 24 slides

Le Leveraging S Systems Th Thinking t to U o Understand th the Comp mplexiti ties of f

Le Leveraging S Systems Th Thinking t to U o Understand th the Comp mplexiti ties of f Commu mmunity ty Problems ms COMMUNITY CHANGE INSTITUTE cities of the future co-creating tomorrow Facilitated by: Yassaman Nouri To Tools

569 views • 45 slides

Enterprise Risk Management for Hospital Systems: What Counsel Needs to Know Leveraging ERM as a

Presenting a live 90-minute webinar with interactive Q&A Enterprise Risk Management for Hospital Systems: What Counsel Needs to Know Leveraging ERM as a Strategic Business Tool WEDNESDAY, APRIL 4, 2012 1pm Eastern | 12pm Central |

767 views • 60 slides

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires a performance computer program. good? 1.5 1 0.5 0 -0.5 0 5 101520253035404550 COMBINED DEFINITION OF TUNING 1 0.8 v 0.6 First

753 views • 22 slides

Leveraging the natural strengths of humanity and our collective immune systems to source the best

KIADIS PHARMA | COMPANY PRESENTATION | NOVEMBER 2019 EURONEXT: KDS Leveraging the natural strengths of humanity and our collective immune systems to source the best cells for life Cell therapy to treat cancer, combining innate and adaptive

777 views • 21 slides

Put Dutch GPU research on the (road)map! A Reconnaissance Project by: Whats in a name?

Put Dutch GPU research on the (road)map! A Reconnaissance Project by: Whats in a name? GPUs (Graphical Processing Unit) The most popular accelerators Performance reports of 1-2 orders of magnitude larger than CPU Mix-and-match in

492 views • 34 slides