NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - PowerPoint PPT Presentation

Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter | Jülich Supercomputing Centre (JSC)

Forschungszentrum Jülich at a Glance (status 2010)  Budget: 450 mio Euro  Staff: 4,800 (thereof 1,630 scientists)  Visiting scientists: 900 per year  Trainees: 90  Publications: 1,800  Protective rights and licences: 14,800  Research fields: health, energy and environment, and information technology; key technologies for tomorrow 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 2

Jülich Supercomputing Centre Supercomputer operation for: Centre – FZJ,  Regional – JARA  Helmholtz & National – NIC, GCS  Europe – PRACE, EU projects  Application support  User support; coordination with SimLabs  Scientific Visualization  Peer review support and coordination R&D work Algorithms, performance analysis and tools  Community data management service  Computer architectures, Exascale Laboratories: EIC, ECL, NVIDIA  Education and Training 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 3

Supercomputer Systems: Dual Track Approach IBM Power 4+ 2004 JUMP, 9 TFlop/s IBM Blue Gene/L IBM Power 6 2006-8 JUBL, 45 TFlop/s JUMP, 9 TFlop/s JUROPA 200 TFlop/s HPC-FF 2009 IBM Blue Gene/P 100 TFlop/s JUGENE, 1 PFlop/s JUDGE 240 TFlop/s File Server IBM Blue Gene/Q 2012 GPFS, Lustre JUQUEEN JUROPA++ 5.7 PFlop/s (target) Cluster, 1-2 PFlop/s 2014 + Booster General-Purpose Highly-Scalable 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 4

JUDGE Cluster System  206 IBM iDataPlex nodes  2 Tesla M2050 or M2070 per node  Infiniband QDR network  Peak performance: 239 Tflops Users  Institute for Advanced Simulations  Molecular dynamics and mechanics, micro-magnetism simulations, medical image reconstruction  JuBrain partition  Milkey Way partition 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 5

NVIDIA Application Lab at Jülich Collaboration between JSC and NVIDIA since July 2012  Enable scientific applications for GPU-based architectures  Provide support for their optimization  Investigate performance and scaling Work focus  Application requirements analysis  Kepler and CUDA feature analysis  Parallelization on many GPUs  Collaboration with performance tools developers  Training 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 6

Pilot Application: JuBrain Application developed at the Institute of Neuroscience and Medicine (INM-1) at Forschungszentrum Jülich: Katrin Amunts, Markus Axer, Marcel Huysegoms Research goal Accurate, highly detailed computer model of the human brain 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 7

Brain Section Images Blockface pictures Exceeds GPU  Created while cutting brain in sections memory capacity Histological images  Polarized light images  Low resolution vs. high resolution  100 μ m → 3 μ m pixel size  30 MBytes → 4 0 Gbytes data Challenge: 3d reconstruction 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 8

3D Reconstruction Moving image Metric Optimizer Fixed image Interpolator Transformation O(30) Registration algorithms  → 3 parameters Rigid registration speedup  → 6 parameters Afine registration on GPU  → O(100) parameters Elastic registration 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 9

Fluid dynamics on Fermi and Kepler Lattice Boltzmann method  D2Q37 model  Application developed at U Rome Tore Vergata/INFN, U Ferrara/INFN, TU Eindhoven  Reproduce dynamics of fluid by simulating virtual particles which collide and propagate  Simulation of large systems requires double precision computation on many GPUs 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 10

Collide kernel on Fermi  Kernel dominated by arithmetic operations  Floating-point performance as a function of the number of threads/block [GFlop/s] Excellent performance on Fermi Implementation: F. Schifano (U Ferrara/INFN) 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 11

Kepler Performance Tuning for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; u = u + param_cx[i] * lPop; Performance analysis observations v = v + param_cy[i] * lPop; }  Significant increase of L1 cache misses 17% (Tesla M2090) → 67% (Tesla K20 )  #pragma unroll for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; SM performance increased, but L1 cache u = u + param_cx[i] * lPop; v = v + param_cy[i] * lPop; capacity remained unchanged } Problem mitigation by simple code change Enforce loop unrolling to eliminate indirect memory accesses J. Kraus (NVIDIA Lab) 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 12

Collide kernel on Kepler GK110 Comparison Fermi vs. Kepler  Grid size considered here: 252 x 16384  Floating-point performance as a function of the number of threads/block Performance improvement 1.7x 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 13

Propagate kernel Kernel dominated by memory access  Grid size considered here: 252 x 16384  Memory bandwidth [GByte/s] as a function of the number of threads/block Performance improvement 1.4x 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 14

Summary NVIDIA Application Lab at Jülich  New and fruitful model for collaboration  We are just at the beginning ... Application requirements analysis  JuBrain: Project aiming for realistic model of the human brain Kepler feature analysis  Initial performance results for Lattice Boltzmann application on GK110  Very high performance level reached on Fermi can be sustained 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 15

NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - PowerPoint PPT Presentation

Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing Centre (JSC) Forschungszentrum Jlich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800 (thereof 1,630 scientists)

LICH PROPOSAL 1 Proposal Sponsor and Partners PROPOSAL SPONSOR: Fortis Property Group, LLC, a

From brain research to high-energy physics: GPU-accelerated applications in Jlich Dirk

LicH BuiLdiNg 5 347 HeNrY Street BrookLYN BLock 291, Lot 1 12.19.2018 PrePAred BY: roMiNeS

INTENSIVE APPLICATION ON VMWARE HORIZON VIEW USING NVIDIA GRID VGPU Manvender Rawat, NVIDIA Lan

Transforming Long Island College Hospital February 7, 2014 A Made in Brooklyn Proposal

Scalable performance analysis of large-scale parallel applications Brian Wylie & Markus

Eliot Stark Chief Executive STRiVE Partnerschaft Partnerschaft Partnership

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

CONTEXT SENSITIVE UTILITIES SUCCESS STORIES PRESENTED BY: DISTRICT 3: Joseph Plunk, Stewart

Teaching a Car to Drive: An application of End-to-End Deep Learning Larry Jackel NVIDIA, Holmdel

Macroeconomics and Household Inequality: Data, Models and an Application Dirk Krueger University

Porting Scalable Parallel CFD Application Krishnababu et. al. HiFUN on NVIDIA GPU D. V.

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

Integrating the NVIDIA Material Definition Language MDL in Your Application Lutz Kettner

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

and computing platform for brain research Mitglied der Helmholtz-Gemeinschaft D. Pleiter | San

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

Semantics: Application to C Programs Lecture Slides by Dr. Marie-Christine Jakobs Prof. Dr. Dirk

The Human Sniff: Application of NVIDIA Index Advanced Rendering Solution in HPC Vishal Mehta

A CUDA FORTRAN PORT OF CLOVERLEAF GREG RUETSCH, NVIDIA CLOVERLEAF APPLICATION Component of

GCN INTRODUCTION AND ITS APPLICATION IN 3D POINT CLOUD SEMANTIC SEGMENTATION Yisong Li (NVIDIA),

Who We Are Nathan Reed NVIDIA DevTech 2 yrs Previously: game graphics programmer at Sucker

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - PowerPoint PPT Presentation

Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing Centre (JSC) Forschungszentrum Jlich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800 (thereof 1,630 scientists)

LICH PROPOSAL 1 Proposal Sponsor and Partners PROPOSAL SPONSOR: Fortis Property Group, LLC, a

From brain research to high-energy physics: GPU-accelerated applications in Jlich Dirk

LicH BuiLdiNg 5 347 HeNrY Street BrookLYN BLock 291, Lot 1 12.19.2018 PrePAred BY: roMiNeS

INTENSIVE APPLICATION ON VMWARE HORIZON VIEW USING NVIDIA GRID VGPU Manvender Rawat, NVIDIA Lan

Transforming Long Island College Hospital February 7, 2014 A Made in Brooklyn Proposal

Scalable performance analysis of large-scale parallel applications Brian Wylie &amp; Markus

Eliot Stark Chief Executive STRiVE Partnerschaft Partnerschaft Partnership

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

CONTEXT SENSITIVE UTILITIES SUCCESS STORIES PRESENTED BY: DISTRICT 3: Joseph Plunk, Stewart

Teaching a Car to Drive: An application of End-to-End Deep Learning Larry Jackel NVIDIA, Holmdel

Macroeconomics and Household Inequality: Data, Models and an Application Dirk Krueger University

Porting Scalable Parallel CFD Application Krishnababu et. al. HiFUN on NVIDIA GPU D. V.

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

Integrating the NVIDIA Material Definition Language MDL in Your Application Lutz Kettner

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

and computing platform for brain research Mitglied der Helmholtz-Gemeinschaft D. Pleiter | San

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

Semantics: Application to C Programs Lecture Slides by Dr. Marie-Christine Jakobs Prof. Dr. Dirk

The Human Sniff: Application of NVIDIA Index Advanced Rendering Solution in HPC Vishal Mehta

A CUDA FORTRAN PORT OF CLOVERLEAF GREG RUETSCH, NVIDIA CLOVERLEAF APPLICATION Component of

GCN INTRODUCTION AND ITS APPLICATION IN 3D POINT CLOUD SEMANTIC SEGMENTATION Yisong Li (NVIDIA),

Who We Are Nathan Reed NVIDIA DevTech 2 yrs Previously: game graphics programmer at Sucker

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

Scalable performance analysis of large-scale parallel applications Brian Wylie & Markus