Lecture 17: Performance Issues Abhinav Bhatele, Department of - PowerPoint PPT Presentation

Introduction to Parallel Computing (CMSC498X / CMSC818X) Lecture 17: Performance Issues Abhinav Bhatele, Department of Computer Science

Announcements • Assignment 3 is due on Nov 9 • Interim report for the group project is due on Nov 16 • Provide more details about the project: serial algorithm, parallel algorithm, languages being used • Deliverables and metrics for success • Contributions of individual group members Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 2

Performance metrics • Time to solution • Time per step (iteration) • Science progress (figure of merit per unit time) • Floating point operations per second (flop/s) • When comparing multiple data points: • Speedup, efficiency Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 3

What is the best performance we can get? • Peak flop/s • Peak memory bandwidth • Peak network bandwidth • Why do we not achieve peak performance? Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 4

What is happening in a program • Integer operations • Floating point operations • Conditional instructions (branches) • Loads/stores • Data movement across the network (messages + I/O) Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 5

Performance issues • Algorithmic overhead • More computation when running in parallel (e.g. prefix sum) • Speculative loss • Perform extra computation speculatively but not use all of it for the result • Critical paths • Dependencies between computations spread across processes / threads • Bottlenecks • Serial bottlenecks: one process doing some computation and holding everyone up Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 6

Performance issues • Sequential performance issues • Inefficient memory access: data movement in the memory hierarchy • Inefficient floating point operations • Load imbalance • Some processes doing more work than most • Communication performance • Spending increasing proportion of time on communication Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 7

Communication performance • Overhead and grainsize (Lots of tiny messages or a few very large messages) • No overlap between communication and computation • Increasing amounts of communication as we run on more processes • Frequent global synchronization Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 8

Critical paths • A long chain of dependencies across processes • We want to identify and avoid having long critical paths • Solutions: • Eliminate completely if possible • Shorten the critical path • Reduce time spent in a path by removing work on the critical path Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 9

Bottlenecks • Detect bottlenecks • One process busy while all others wait • Examples: • Reduce to one process and then broadcast • One process responsible for input/output • One process responsible for assigning work to others • Solutions: • Parallelize as much as possible, use hierarchical schemes Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 10

Sequential performance issues • Identify issues using performance tools • Solutions: • Minimize data movement • Data reuse • Optimize floating point calculations Abhinav Bhatele (CMSC498X/CMSC818X) LIVE RECORDING 11

Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu

Lecture 17: Performance Issues Abhinav Bhatele, Department of - PowerPoint PPT Presentation

Introduction to Parallel Computing (CMSC498X / CMSC818X) Lecture 17: Performance Issues Abhinav Bhatele, Department of Computer Science Announcements Assignment 3 is due on Nov 9 Interim report for the group project is due on Nov 16

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

IDN Variant Issues Project Integrated Issues Report Coordination Team Meeting IDN Variant Issues

CURRENT ISSUES IN FLORIDA LAND USE LAW ISSUES IN FLORIDA LAND USE LAW Florida Land Use Law

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Pulmonary adenocarcinoma Issues, Issues and more issues. Why the headache? Alain Borczuk In

Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation,

CPU Performance Lecture 8 CAP 3103 06-11-2014 1.6 Performance Defining Performance Which

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

XML Signature Performance and One-Pass Processing Issues Position Paper Presentation Sean Mullan

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

March 2019 CONTENTS Page Combined Partner Performance 1 Breckland Performance Reports 2-6

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Verification Verification, Performance Performance Analysis Performance Performance Analysis

Science is in trouble Information overload Built-in bias Reproducibility issues Access issues

1 PREVENT ISSUES ANALYZE ISSUES WWi Corrective Action Graphs and Reports Improved

1 PREVENT ISSUES ANALYZE ISSUES WWi Corrective Action Graphs and Reports Improved

Floating Content: Infrastructure-less Information Sharing in Urban Environments Jussi

Floating Content: Infrastructure-less Information Sharing in Urban Environments Jussi

C Programming for Engineers Iteration ICEN 360 Spring 2017 Prof. Dola Saha 1 Data

3. Java - Language Constructs I Convention for class names: use CamelCase Words are combined

1 Deterministic Finite Automata S* 0,1 Finite Automaton Finite Internal States 0,1 0,1

Proposed Updates To LArSoft: LArEventDisplay LArReco Tracy Usher LArSoft Coordination

Data-Level Parallelism Vector, SIMD, GPU 1 MO401 Tpicos IC-UNICAMP Vector

1 Mor M. Peretz, Switch-Mode Power Supplies [7-4] Miller effect V o C gd R g G I g V s C gs