OPTIMISING PARALLEL PROGRAMS ON XEON PHI Adrian Jackson - PowerPoint PPT Presentation

May 13, 2023 •48 likes •138 views

OPTIMISING PARALLEL PROGRAMS ON XEON PHI Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Specialised Optimisations Some optimisation are specific to Xeon Phi only Offloading MPI performance Thread and process placement

OPTIMISING PARALLEL PROGRAMS ON XEON PHI Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc
Specialised Optimisations • Some optimisation are specific to Xeon Phi only • Offloading • MPI performance • Thread and process placement • Filesystems
Offload memory • By default memory allocated for all data before offload and deallocated on completion of offload • Can use offload_transfer directive to explicitly manage data #pragma offload_transfer target(mic:1) in(a) !dir$ offload_transfer target(mic:1) in(a) • Can specify allocation and free status for device memory !dir$ offload target(mic:0) in(p : alloc_if(.true.) free_if(.false.)) #pragma offload target(mic) out(p : alloc_if(1) free_if(0)) • Can be combined with length attribute ( length(0) would specify no transfer) • Also possible to send data asynchronously using signal and wait attributes/directives • Can get information on data transfer export OFFLOAD_REPORT=2
MPI fabric choice • Intel MPI can choose different mechanisms for sending data: • shm: Shared-memory • dapl: DAPL-capable network fabric (Infiniband etc…) • ofa: OFA-capable network fabric (Infiniband etc…) • tcp: TCP/IP-capable network fabrics (Ethernet etc…) • Can specify what fabric to use: export I_MPI_FABRICS=shm:dapl
MPI fabric choice • By default inside single Phi: • If dapl is installed (or infiniband card installed) • shm:dapl • May be beneficial in some circumstances to select a specific one
Thread placement • KMP_AFFINITY variable controls thread placement export KMP_AFFINITY= [attribute] • Attribute can be: • compact , scatter , balanced , or explicit • Can specify granularity as well • fine , thread , and core (default) export KMP_AFFINITY=compact,granularity=fine export KMP_AFFINITY=scatter • Compute bound application: • compact (2 or more threads per core) • Bandwidth-bound application: • scatter (1 thread per core)
File systems • RAM file system • Stored in memory • Fastest • Volatile • Local host drives • Mount disk from host on Xeon Phi • Persistent, not as fast as RAM file system • Network storage • Gives access to larger data systems • Even slower
Conclusions • Setup of hardware and software on Phi can make performance difference • Communication hardware or libraries • Filesystems • Placement of threads critical for performance • If offloading, looking at data persistence is a good optimization option

Recommend

Outline Background 1 Xeon Phi Architecture 2 Programming Xeon Phi TM 3 Native Mode Offload

Xeon Phi TM Programming Intel R An Overview Anup Zope Mississippi State University 20 March 2018 Xeon Phi TM Programming Intel R Anup Zope (Mississippi State University) 20 March 2018 1 / 46 Outline Background 1 Xeon Phi

1.1k views • 46 slides

XEON PHI BASICS Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Xeon Phi Basics Reusing this

XEON PHI BASICS Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Xeon Phi Basics Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

740 views • 63 slides

AsHES 2014 XSW: Accelerating Biological Database Search on Xeon Phi School of Computer Science

AsHES 2014 XSW: Accelerating Biological Database Search on Xeon Phi School of Computer Science and Technology Shandong University, China May, 2014 Contents Motivation S mith-Waterman Algorithm Mapping onto the Xeon Phi

250 views • 24 slides

Optimizing Codes For Intel Xeon Phi Brian Friesen NERSC 2017 July 26 Cori What is different

Optimizing Codes For Intel Xeon Phi Brian Friesen NERSC 2017 July 26 Cori What is different about Cori? Cori is transitioning the NERSC workload to more energy efficient architectures Cray XC40 system with 9688 Intel Xeon Phi

744 views • 39 slides

PCS SERVICE FOR SALE FOR SALE Used PHI 660 Scanning Auger PHI 660 Scanning Auger Used

PCS SERVICE FOR SALE FOR SALE Used PHI 660 Scanning Auger PHI 660 Scanning Auger Used Nanoprobe Nanoprobe PHI 660 Scanning Auger PHI 660 Scanning Auger Nanoprobe Nanoprobe PCS SERVICE ELECTRON BEAM - - SAMPLE INTERACTION SAMPLE

348 views • 11 slides

THE PHI PROJECT THE FINANCIAL IMPACT OF BREACHED PROTECTED HEALTH INFORMATION A

THE PHI PROJECT THE FINANCIAL IMPACT OF BREACHED PROTECTED HEALTH INFORMATION A BUSINESS CASE FOR ENHANCED PHI SECURITY THE PHI PROJECT REQUIRED: Enhanced programs for safeguarding Protected Health Information (PHI) WHO:

436 views • 29 slides

Omega Psi Phi Fraternity, Inc. Eta Delta Delta Chapter The History of Omega Psi Phi Omega

Omega Psi Phi Fraternity, Inc. Eta Delta Delta Chapter The History of Omega Psi Phi Omega Psi Phi Fraternity, Inc. founded at Howard University November 17, 1911. Four Founders: Dr. Ernest Just, Bishop Edgar Love, Dr. Oscar Cooper, and

771 views • 29 slides

The Ritual Review of Phi Sigma Pi National Honor Fraternity Phi Sigma Pi National Honor

The Ritual Review of Phi Sigma Pi National Honor Fraternity Phi Sigma Pi National Honor Fraternity 2018 Leadership Academy Ritual Review Objectives 1. Evaluate your Chapters current practice of The Ritual 2. Connect The Ritual to Phi Sigma

601 views • 26 slides

Communicating Phi Sigma Pis Mission and Identity Objectives Review Phi Sigma Pis

Communicating Phi Sigma Pis Mission and Identity Objectives Review Phi Sigma Pis mission statement. Explore various aspects of Phi Sigma Pis identity and evaluate the extent to which the Chapter is aligned. Discuss

655 views • 33 slides

Towards Direct Visualization on CPU and Xeon Phi Aaron Knoll SCI Institute, University of Utah

Towards Direct Visualization on CPU and Xeon Phi Aaron Knoll SCI Institute, University of Utah Intel HPC DevCon 2016 In collaboration with: Ingo Wald, Jim Jeffers Intel Corporation Joe Insley, Silvio Rizzi, Mike Papka

951 views • 54 slides

GPU vs Xeon Phi: Performance of Bandwidth Bound Applications with a Lattice QCD Case Study

GPU vs Xeon Phi: Performance of Bandwidth Bound Applications with a Lattice QCD Case Study Mathias Wagner GTC 2015 | Mathias Wagner | Indiana University | Lattice Quantum ChromoDynamics and Deep Learning sorry, not (yet?) here. GTC

595 views • 46 slides

Hybrid MPI - A Case Study on the Xeon Phi Platform Udayanga Wickramasinghe Center for Research on

Hybrid MPI - A Case Study on the Xeon Phi Platform Udayanga Wickramasinghe Center for Research on Extreme Scale Technologies (CREST) Indiana University Greg Bronevetsky Lawrence Livermore National Laboratory Andrew Friedley Intel Corporation

571 views • 42 slides

FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi Heeseung Jo Chonbuk

FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi Heeseung Jo Chonbuk National University Woonhak Kang Georgia Institute of Technology Changwoo Min Virginia Tech Taesoo Kim Georgia Institute of Technology Motivation

227 views • 21 slides

Harnessing the Intel Xeon Phi x200 Processor 2017 IXPUG US Annual Meeting for Earthquake

Harnessing the Intel Xeon Phi x200 Processor 2017 IXPUG US Annual Meeting for Earthquake Simulations Alexander Breuer, Yifeng Cui, Alexander Heinecke (Intel), Josh Tobin, Chuck Yount (Intel) AWP-ODC-OS What is AWP-ODC-OS? AWP-ODC-OS

298 views • 17 slides

the Xeon Phi Jie Lin, Qingbo Wu, Yusong Tan, Jie Yu, Qi Zhang, Xiaoling Li and Lei Luo College

MicRun A Framework for Scale-free Graph Algorithms on SIMD Architecture of the Xeon Phi Jie Lin, Qingbo Wu, Yusong Tan, Jie Yu, Qi Zhang, Xiaoling Li and Lei Luo College of Computer National University of Defense Technology 10/7/2017

522 views • 23 slides

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1 ,

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1 , Kamer Kaya 1 and urek 1 , 2 Umit V. C ataly esaule@uncc.edu, { kamer,umit } @bmi.osu.edu 1 Department of Biomedical Informatics 2 Department of

858 views • 23 slides

Open Microphone Meeting: USP General Chapter <797> Pharmaceutical Compounding Sterile

Open Microphone Meeting: USP General Chapter <797> Pharmaceutical Compounding Sterile Preparations October 21, 2015 2:00 p.m. to 4:00 p.m. EDT Agenda Welcome Overview of USPs Revision Process Overview of Revised General

506 views • 25 slides

Regular logarithmic connections Motivic Geometry CAS Oslo Sep 8, 2020 Piotr Achinger IMPAN

Regular logarithmic connections Motivic Geometry CAS Oslo Sep 8, 2020 Piotr Achinger IMPAN Warsaw I Regular connections (after Deligne) Regularity in dimension one = t d K = C (( t )) C [[ t ]] = O dt M fin. dim. over K

379 views • 22 slides

Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia

Meeting Recorder: Audio Processing Dan Ellis <dpwe@ee.columbia.edu> Lab ROSA , Columbia University and ICSI, Berkeley Outline 1 ICSI Meeting Recorder 2 Close-mics: cancellation & turn estimation 3 Tabletop mics: turns &

491 views • 11 slides

Pattern Recognition Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universitt zu Kiel

Pattern Recognition Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universitt zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory Beamforming Contents

820 views • 44 slides

Planning AV for your event All general use classrooms are equipped with: Sound system for

Planning AV for your event All general use classrooms are equipped with: Sound system for laptop/video Video-Data Projector (VDP) DVD Player or optical drive Upgraded rooms also have: Computer (Login with UVic NetLink ID)

303 views • 11 slides

CS 5220: Heterogeneity and accelerators David Bindel 2017-10-03 1 Reminder: Totient cluster

CS 5220: Heterogeneity and accelerators David Bindel 2017-10-03 1 Reminder: Totient cluster structure Consider: Each core has vector parallelism Each chip has six cores, shares memory with others Each box has two chips, shares

705 views • 31 slides

Getting Acquainted W ith Zoom Outline Securing Your Computer Your Invitation The

Getting Acquainted W ith Zoom Outline Securing Your Computer Your Invitation The Waiting Room Working with Audio Working with Video Using the Tool Tray Chat In the Meeting When Things Go Wrong 2 Securing Your

325 views • 14 slides

Smart Microphones n Sound source direction finding, null- and beam- steering in the presence of

I n t e g r a t e d M e d i a S y s t e m s C e n t e r Array Audio Signal Processing and Virtual Microphones Chris Kyriakakis IMSC Immersive Audio Laboratory University of Southern California N a t i o n a l S c i e n c e F o u n d

437 views • 9 slides