Basic features of the API Memory allocation and sample API calls - PowerPoint PPT Presentation

 DMAPP in context  Basic features of the API  Memory allocation and sample API calls  Preliminary Gemini performance measurements 2

The Distributed Memory Application (DMAPP) API  Supports features of the Gemini Network Interface  Used by higher levels of the software stack: PGAS compiler runtime  SHMEM library   Balance between portability and hardware intimacy  Intended to be used by system software developers  Application developers should use SHMEM 3

Apps PE MPICH2 MPICH2 Cray SHMEM Cray SHMEM PGAS compilers PGAS compilers user-level GNI DMAPP kernel Linux Core kernel-level GNI Gemini HW Abstraction Layer Gemini HW Abstraction Layer HW Gemini network processor Gemini network processor 4

 Distributed memory model  One-sided model for participating (SPMD) processes launched by Alps aprun command  Each PE has local memory but has one-sided access (PUT/GET) to remote memory  Remote memory has to be in an accessible memory segment 5

PE PE put source destination  Network supports direct remote get/put from user process to user process.  Mechanisms:  Block Transfer (BTE)  Fast Memory Access (FMA) including Atomic Memory Operations (AMOs) 6

process Remote op segments  Remote source or destination in either data or symmetric-heap segments  Symmetry means we can use local address information in remote context 7

 dmapp_init  Sets up access to data and symmetric heap (exports memory)  barrier  you can set or read available resource limits  dmapp_get_jobinfo Returns a structure with useful information:  Number of PEs   Index of this PE Pointers to data and symmetric heap segments  required in other calls 8

dmapp_put(*target_addr, *target_seg, target_pe, source_addr, nelems, type)  Remote locations defined by: address, segment, pe  This is a blocking operation  type can be DMAPP_{BYTE,DW,QW,DQW) for 1, 4, 8 and 16 bytes.  Analogous get call 9

 Blocking (no suffix) dmapp_put , dmapp_get  Non-blocking explicit (_nb suffix) dmapp_put_nb (…, syncid)  Non-blocking implicit (_nbi suffix)  No handle to test for completion  Synchronization (memory completion/visibility)  Can wait on specific syncid  Can wait for all implicit operations to complete 10

iput put Remote data  Strided calls dmapp_iput …, dmapp_iget …  Additional arguments define source and destination stride in terms of elements 11

ixput put Remote data  Scatter/Gather calls dmapp_ixput …, dmapp_ixget …  Local data is contiguous  Remote data is distributed as defined by an array of offsets 12

put_ixpe PE 0 PE 1 put PE 2 nelems =3  Put with indexed PE-stride calls dmapp_put_ixpe …, dmapp_get_ixpe …  Local data is contiguous  Remote data is distributed (as defined by an array of PE-offsets) to the same address on each PE  Use for small amounts of data  These are not collective operations 13

scatter_ixpe PE 2 PE 4 put PE 6 nelems =1  Scatter/Gather with indexed PE-stride calls dmapp_scatter_ixpe , dmapp_gather_ixpe  Local data is contiguous  Source is scattered to (or gathered from) PEs nelems elements at a time. 14

Atomic operations to 8-byte (QW) remote data Command Operation AADD Atomic ADD AAND Atomic AND AOR Atomic OR AXOR Atomic EXCLUSIVE OR AFADD Atomic fetch and ADD AFAND Atomic fetch and AND AFOR Atomic fetch and OR AFXOR Atomic fetch and XOR AFAX Atomic fetch AND-EXCLUSIVE OR ACSWAP Compare and SWAP 15

AADD AFADD t  Direct support in NIC  Be careful to only read values via DMAPP API 16

 Some calls return syncid (_nb)  Can test or wait on completion  dmapp_syncid_wait(*syncid)  dmapp_syncid_test(*syncid,*flag)  For implicit non-blocking (_nbi)  dmapp_gsync_wait()  Dmapp_gsync_test(*flag)  Use for many small messages 17

 DMAPP applications can allocate memory in symmetric heap double *a; a=(double*) dmapp_sheap_malloc(N*sizeof(double));  Associated realloc and free calls.  Application is responsible for maintaining symmetry of allocations 18

DMAPP exports data and symmetric heap for you This means:  For C  File scope and static inside function  Allocated in symmetric Heap  For Fortran (no API but if there was)  SAVEd data  Data in COMMON 19

PE PE PE PE PE +1 AA +1 +1 +1 +1 Barrier counter PE  Atomic add for master counter (FADD for testing)  Master compares (with n-1) and swaps with 0  … master releases other PEs 20

static uint64_t barrier_counter, bc; if (mype==master){ do{ // wait until counter is npes-1, swap with 0 dmapp_acswap_qw(&bc,(void *)&barrier_counter, seg_data,mype,npes-1,0); } while ( bc!=(npes-1)); } else { dmapp_aadd_qw((void*)&barrier_counter,seg_data, master,1); } // now release barrier… 21

 SHMEM  Has same SPMD model  Requires use of symmetric memory  Original interface is blocking  Non-standard extensions for non-blocking put/get  Varying-sized data items with typed API  Get/put with strided and gather/scatter variants  Barrier and collective operations on sets of PEs  Has the same atomic memory operations  SHMEM is implemented using DMAPP for Gemini systems 22

 Data measured on prototype system during Q1 2010  2100MHz Opteron processors  2400MHz HyperTransport interface  Dual node tests run between PEs on neighbouring Gemini routers 23

2.5 PUT, ping-pong PUT, at source 2.0 GET Time (microsecs) 1.5 1.0 0.5 0.0 8 16 32 64 128 256 512 1024 Size (bytes) 24

7000 6000 PPN=1 Bandwidth (mbytes/sec) PPN=2 5000 PPN=4 4000 3000 2000 1000 0 8 16 32 64 128 256 512 1024 2K 4K 8K 16K 32K 64K Element size (bytes) 25

3000 2500 Bandwidth (mbytes/sec) 8 bytes 2000 64 bytes 256 bytes 1500 1000 500 0 1 2 4 8 16 32 64 Non-blocking puts 26

160 Vector length = 16 140 Vector Length = 64 Vector length = 4096 120 Rate (millions/sec) 100 80 60 40 20 0 2 4 8 16 32 64 128 256 512 1024 2048 4096 Stride (64-bit words) 27

120 100 1 AMO 8192 AMOs AMO rate (millions) 80 60 40 20 0 0 256 512 768 1024 Number of processes 28

 Latency (~1 s) far better than SeaStar  Good aggregate bandwidths on small transfers  High AMO rates, especially when multiple processes target the same variables  Strided puts are an important case for CAF  Ongoing optimization effort (for example reduce number of FMA descriptor updates) 29

 What is DMAPP and where does it fit?  Basic features of the API  Memory allocation and sample API calls  Preliminary Gemini performance data 30

Basic features of the API Memory allocation and sample API calls - PowerPoint PPT Presentation

DMAPP in context Basic features of the API Memory allocation and sample API calls Preliminary Gemini performance measurements 2 The Distributed Memory Application (DMAPP) API Supports features of the Gemini Network Interface

RESTFUL API BEST PRACTICES By Malwina Nowakowska STX NEXT talented developers | flexible teams

API Ruby on Rails UI ES API Hedtek Wijiti API API Elasticsearch Depositing user Build

API Connect Arnauld Desprets - arnauld_desprets@fr.ibm.com Technical Sale 0 Agenda 1. API

Spock Data driven testing RESTful API What is a RESTful API ? A RESTful API is an application

Introduction to the SAGA API Outline SAGA Standardization API Structure and Scope (C++)

Study of an API Migration for two XML APIs Thiago Bartholomei Krzysztof Czarnecki Ralf Lmmel

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

API Gateway API Gateway Gateway ESB At present tooling for API

Data sharing through an API platform API AGRO Erik Rehben, Batrice Balvay , Theo Paul

Preparing a REST API Rules of REST APIs, API patterns, Typical CRUD operations Rules for a REST

Exploring API Embedding for API Usages and Applications Yi Chang Trong Duc Nguyen, Anh Tuan

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

CASCO N 2002 CASCO N 2002 Outline Basic Concepts API Architecture API Programming

The CellML API: An update Andrew Miller ak.miller@auckland.ac.nz Introduction The CellML

(CRDP API-SPW) Updates: In Our Own Words CRDP Conference February 1, 2012 (Los Angeles, CA)

Slovak Banking API Standard. Rastislav Hudec, Marcel Laznia 01. Slovak Banking API Standard:

Memory Management CS 416: Operating Systems Design Department of Computer Science Rutgers

Prs r rr str

Output in Window Systems and Toolkits Interactive System Layers Interactive Application Toolkit

What Is Information and Where Does It Come From? Randy Isaac Retired VP of Science and

Symmetric Key Crypto, Part 2 Prof. Tom Austin San Jos State University REVIEW: A5/1 lab X x

ECE/CS 250 Computer Architecture Summer 2019 Basics of Logic Design: Boolean Algebra, Logic

Overview of Virtex Virtex 4 4 Overview of & Virtex Virtex 4 BIST Project 4 BIST Project

A Very Compact FPGA Implementation of LED and PHOTON N. Nalla Anandakumar 1 , 2 Thomas Peyrin 2

Basic features of the API Memory allocation and sample API calls - PowerPoint PPT Presentation

DMAPP in context Basic features of the API Memory allocation and sample API calls Preliminary Gemini performance measurements 2 The Distributed Memory Application (DMAPP) API Supports features of the Gemini Network Interface

RESTFUL API BEST PRACTICES By Malwina Nowakowska STX NEXT talented developers | flexible teams

API Ruby on Rails UI ES API Hedtek Wijiti API API Elasticsearch Depositing user Build

API Connect Arnauld Desprets - arnauld_desprets@fr.ibm.com Technical Sale 0 Agenda 1. API

Spock Data driven testing RESTful API What is a RESTful API ? A RESTful API is an application

Introduction to the SAGA API Outline SAGA Standardization API Structure and Scope (C++)

Study of an API Migration for two XML APIs Thiago Bartholomei Krzysztof Czarnecki Ralf Lmmel

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

API Gateway API Gateway Gateway ESB At present tooling for API

Data sharing through an API platform API AGRO Erik Rehben, Batrice Balvay , Theo Paul

Preparing a REST API Rules of REST APIs, API patterns, Typical CRUD operations Rules for a REST

Exploring API Embedding for API Usages and Applications Yi Chang Trong Duc Nguyen, Anh Tuan

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

CASCO N 2002 CASCO N 2002 Outline Basic Concepts API Architecture API Programming

The CellML API: An update Andrew Miller ak.miller@auckland.ac.nz Introduction The CellML

(CRDP API-SPW) Updates: In Our Own Words CRDP Conference February 1, 2012 (Los Angeles, CA)

Slovak Banking API Standard. Rastislav Hudec, Marcel Laznia 01. Slovak Banking API Standard:

Memory Management CS 416: Operating Systems Design Department of Computer Science Rutgers

Prs r rr str

Output in Window Systems and Toolkits Interactive System Layers Interactive Application Toolkit

What Is Information and Where Does It Come From? Randy Isaac Retired VP of Science and

Symmetric Key Crypto, Part 2 Prof. Tom Austin San Jos State University REVIEW: A5/1 lab X x

ECE/CS 250 Computer Architecture Summer 2019 Basics of Logic Design: Boolean Algebra, Logic

Overview of Virtex Virtex 4 4 Overview of &amp; Virtex Virtex 4 BIST Project 4 BIST Project

A Very Compact FPGA Implementation of LED and PHOTON N. Nalla Anandakumar 1 , 2 Thomas Peyrin 2

Overview of Virtex Virtex 4 4 Overview of & Virtex Virtex 4 BIST Project 4 BIST Project