HPC Systems Engineering in the Interaction Room Dr. Matthias Book Professor for Software Engineering
Software Engineering Challenges Need to ensure we are building the right software, and we are building it right But many sources of miscommunication between domain & technology experts: Different vocabulary / areas of competence Struggling to convey / understand requirements precisely Struggling to realize what is non-obvious / implicit knowledge / unknown Struggling to realize what bears particular value / effort / risk Struggling to convey / understand what is fixed , what is flexible , what is variable Specification documents often do not solve these problems, but just mask them Same struggles put in writing Issues surface later, when they are more expensive to fix Agile approaches encourage (and actually depend on) more interaction but provide little guidance for communicating about what is really crucial in a project Matthias Book: HPC Systems Engineering in the Interaction Room 2
The Nature of Software Development “ Because software is embodied knowledge , and that knowledge is initially dispersed , tacit , latent , and incomplete , software development is a social learning process.” Howard Baetjer, Jr.: Software as Capital. IEEE Computer Society Press, 1998 Matthias Book: HPC Systems Engineering in the Interaction Room 3
The Interaction Room Successful projects require personal, focused discussion of critical project aspects thorough understanding of application domain , and how it is modeled in software early recognition of value and effort drivers early elimination of risks and uncertainties Process Canvas The Interaction Room is Object Canvas a dedicated room for the project team where domain and technical stakeholders feel at home Integration Canvas with large whiteboards on the walls but without a classic conference table to visualize and discuss key project aspects informally instead of going over tedious documents Interaction Canvas Example: IR for information system development Matthias Book: HPC Systems Engineering in the Interaction Room 4
Interaction Room Annotations Highlight model elements that merit particular consideration : Value annotations Scientific value Risk annotations Complexity Innovation Uncertainty Effort annotations Quality requirements Boundary conditions Interfaces Shift attention from what is visible in models to what is implied, what is assumed, what is unknown i.e. those aspects that often make or break a project Matthias Book: HPC Systems Engineering in the Interaction Room 5
Example: Annotated Process Canvas for an Information System Matthias Book: HPC Systems Engineering in the Interaction Room 6
A Pragmatic Approach to Conceptualizing Software Informal, high-level sketches of software models sacrifice formality, consistency, completeness (no strict UML necessary) in favor of focus, pragmatism, interdisciplinary understanding, value-orientation Not a replacement for formal software specifications! May well be necessary for certain aspects in later stages, and can then be delegated to expert groups Informal sketches serve as catalysts for the identification, understanding and discussion of the most critical project aspects Interdisciplinary communication about domain and technology High-level orientation about project goals, dependencies, conflicts, trade-offs Early identification of value and complexity drivers, risks, uncertainties Matthias Book: HPC Systems Engineering in the Interaction Room 7
Crucial Interdisciplinary Communication Points in HPC Simulation Science Projects Domain experts need to help systems engineers understand : What research question are we trying to answer? What context, what boundary conditions? What parameters and variables are there? How are they evolving over time? Initial values? How do the variables affect each other? Is interaction long- or short-range? What are particularly interesting segments of the simulation space? Are these variable? etc. Systems engineers need to validate technical decisions with domain experts: Cluster architecture: Memory-intensive or compute-intensive? Many-core, multi-core, GPUs? Domain decomposition: How to map the problem most efficiently to the cluster? Communication patterns: Choice of communication type? Ghosts and halos? Memory model: Distributed (MPI), shared (OpenMP) or hybrid? Data structures: What can be transient / must be persistent? Checkpointing? Parallel I/O? etc. Matthias Book: HPC Systems Engineering in the Interaction Room 8
Typical Pitfalls in HPC Simulation Science Projects Choosing appropriate solvers vs. reinventing the wheel Inefficient domain decomposition; load imbalance Dealing with differences between & unique strengths of individual architectures Dealing with different schedulers and their job scripts Debugging costs high amount of (possibly expensive) time Approximation of real world, insufficient validation data Integrating different physical models/processes with each other (multi-physics) Constant change of hardware, software, modus operandi Constant need for porting, always an early adopter, changing code ownership Many of these revealed only in late (i.e. expensive to fix) stages Matthias Book: HPC Systems Engineering in the Interaction Room 9
Software Process for HPC Simulation Science Projects 1. Understand the problem domain 2. Perform appropriate domain decomposition and choose appropriate communicators, helpful libraries, data structures etc. 3. Implement correct code framework for communication between processes; integrate correct problem-domain code into communication code 4. Test and validate simulation model 5. Optimize accuracy, performance tuning Matthias Book: HPC Systems Engineering in the Interaction Room 10
Conceptual Levels in HPC Simulation Science Projects Problem level: Statement of research question / project goal and scope Goal, context, scope: Research question, boundary conditions, assumptions, abstractions Quality requirements: Accuracy, generalizability, performance Scientific level: Description of the pertinent aspects of the domain Static aspects: Coordinates, variables, sources of influence, points of interest, physical laws Dynamic aspects: Forces, interactions, events, timing, discontinuities Distribution level: Breakdown of the scientific model into parallelizable units Static aspects: Domain decomposition, data structure, initial conditions Dynamic aspects: Communication patterns, stencils, halos, ghosts, adaptive mesh refinements, iterative numerical methods Technical level: Implementation of distribution model on particular architecture Static aspects: Cluster architecture, (parallel) file system, memory model, interconnect Dynamic aspects: Communication protocols, I/O operations, available libraries, solvers Matthias Book: HPC Systems Engineering in the Interaction Room 11
Interaction Room Canvases for HPC Simulation Science Problem canvas: Goal and scope of research question about the domain Real-world canvas: Description of the pertinent aspects of the domain Decomposition canvas: Breakdown of scientific model into parallelizable units Architecture canvas: Implementation of simulation on suitable HPC technology Matthias Book: HPC Systems Engineering in the Interaction Room 12
Problem Canvas Goal and scope of research question about the domain Domain experts collect on note cards: Research question Boundary conditions Assumptions Abstractions Quality requirements Example: Heat dissipation problem Question: What will the temperature in the middle of a room be like after running an air conditioner on one side and a fire on the other for several hours? Boundary conditions: Room size, starting temperature, A/C and fire size Abstractions: Consider heat transfer by air flow / convection only, not by radiation Assumptions: No moving objects in the room, no windows Quality requirements: Temperature must be determined with double precision Matthias Book: HPC Systems Engineering in the Interaction Room 13
Real-World Canvas Description of the pertinent aspects of the domain Domain experts sketch static properties of the simulation space Spatial setup Locations and properties of simulation elements Domain experts sketch dynamic properties of simulation process Forces Events Points of interest (actors, sensors) Changes over time Example: Heat dissipation problem Room geometry, placement of fire, A/C, monitor Working of convection forces Working of air flows, times of A/C operation Appropriate formulae, numerical methods Matthias Book: HPC Systems Engineering in the Interaction Room 14
Recommend
More recommend