Self-Adaptive Architectures for Autonomic Computational Science http://wiki.esi.ac.uk/Distributed_Programming_Abstractions Shantenu Jha 1 Manish Parashar 2 Omer Rana 3 1 CCT and CS, LSU & e-Science Institute, Edinburgh 2 Rutgers University & NSF Center for Autonomic Computing, USA 3 Cardiff University & Welsh e-Science Centre, UK Sep. 21 EGEE’09 Barcelona Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 1 / 31
Outline Background 1 Elements of ACS Conceptual Framework 2 Conceptual Architectures 3 Tuning of Application Tuning by Application Distributed Autonomic Applications 4 Abstractions for Distributed Systems and Applications Ensemble Kalman Filters Ensemble Kalman Filters – Mode II Analysis 5 Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 2 / 31
Background Context Grid infrastructures present unprecedented opportunities for computational science and engineering, with the potential for fundamental insights into complex phenomenon: Where can Autonomics be of benefit? Various existing investments in Grid computing infrastructure; Limited uptake Due to (i) complexity of developing applications and (ii) changes in the infrastructure – need some support for system adaptation How can Autonomics Help Computational Science utilize Distributed Resources? DPA theme takes an application centric view; intended to initiate discussion about this theme – not be prescriptive Autonomic Distributed Computational Applications: Break free of static (execution) model and enable dynamic execution of Applications Autonomics + Abstractions: Demonstrate effectiveness in scaling-out across multiple Sites (Grids) Through empirical development, experience and analysis of applications on infrastructure understand the role and advantages of autonomics Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 3 / 31
Elements of ACS Conceptual Framework ACS Framework Elements Application-level Objective (AO): User identified application requirement, e.g. increase throughput, reduce task failure, load balance, etc Mechanism: action used by application or resource manager to achieve AO – mechanism m : ( { m i } , { m e i } , { m o } , { m e o } ), e.g. file staging: { m i } and { m o } : file references before/after staging process { m e i } : input events that trigger start of file staging { m e o } : output events after file staging is completed. Strategy: consists of a collection of mechanisms – manual or dynamically constructed by an autonomic approach Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 4 / 31
Elements of ACS Conceptual Framework Self-Adaptive Approaches Two approaches: Top Down: overall system goals need to be achieved through the modification of interconnectivity or behaviour of system components – realized through a system manager Bottom Up: local behaviour of system components need to be aggregated (without a centralized system manager) to generate some overall system behaviour Current focus, primarily on (i). However, (ii) may be used as a precursor – to dynamically form resource ensembles using clustering approaches. Adaptation approaches (at different levels): Modify Code Modify Structure Modify Application Parameters (based on previous executions, and driven by a set of external constraints) Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 5 / 31
Elements of ACS Conceptual Framework Conceptual Architectures for ACS Tuning of Applications: SAMR and Coupled-Fusion Simulations Tuning by Applications: EnKF Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 6 / 31
Elements of ACS Conceptual Framework Vectors: Understanding Distributed Applications Vectors: Axes representing application characteristics the values of which help us understand: The application requirements, and Design and Constraints of solutions, tools Vector Listing: Executable Unit Communication Coordination Execution Environment Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 7 / 31
Conceptual Architectures Tuning of Application Tuning of application & resource manager parameters Example: Dynamic structured adaptive mesh refinement (SAMR) techniques on structured meshes/grids. SAMR methods employ locally optimal approximations – leading to highly advantageous cost/accuracy ratios. Compared to numerical techniques based on static uniform discretization Focus computational resources to regions with large local solution error at runtime; Adaptive nature and inherent space-time heterogeneity of SAMR implementations = ⇒ dynamic resource allocation, data-distribution, load balancing, and runtime management. Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 8 / 31
Conceptual Architectures Tuning of Application SAMR example 3-D compressible turbulence (RM3D) SAMR simulation with 256 × 64 × 64 resolution. The RM3D application serves as a representative of the class of simulations that exhibit significant dynamism and spatiotemporal heterogeneity Changes in RM3D application physics create dynamically varying simulation workloads – load at each grid point is assumed to be uniform. The peak total workload is about 8 times larger than the minimum total workload and over two times larger than the average total workload for this simulation. From: Sumir Chandra’s PhD thesis Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 9 / 31
Conceptual Architectures Tuning of Application Coupled Fusion Simulation Workflow with coupled simulation codes, i.e., the edge turbulence particle-in-cell (PIC) code (GTC) and the microscopic MHD code (M3D) – run simultaneously on separate HPC resources at supercomputing centers Data streamed and processed enroute – e.g. data from the PIC codes filtered through “noise detection" processes before it can be coupled with the MHD code Efficiently data streaming between live simulations – to arrive just-in-time – if it arrives too early, times and resources will have to be wasted to buffer the data, and if it arrives too late, the application would waste resources waiting for the data to come in Opportunistic use of in-transit resources Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 10 / 31
Conceptual Architectures Tuning of Application Coupled Fusion Application Vectors Mechanisms Coordination Peer-2-Peer interaction Communication Data Streaming, Events Execution Storage Selection (local/remote), Environment Resource Selection/Management, Task migration, Checkpointing Task execution (local/remote) Dynamic provisioning (provisioning of in-transit storage/processing nodes) Table: Tuning mechanisms in the Coupled Fusion Simulation application. Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 11 / 31
Conceptual Architectures Tuning of Application Coupled Fusion Application Application Autonomic Strategy Objective Maintain Resource Management latency-sensitive adaptive data buffering (time, size), data delivery adaptive buffering strategy, adaptive data transmission & destination selection Maximize data Resource Management quality opportunistic in-transit processing adaptive in-transit buffering Scientific Algorithmic Adaptivity Fidelity in-time data coupling model correction using dynamic data solver adaptations - Table: Coupled fusion simulation application management using autonomic strategies. Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 12 / 31
Conceptual Architectures Tuning by Application EnKF: Tuning by application Resource reservation to achieve particular QoS-criteria Dynamic analysis of data stream from a scientific instrument – may also involve analysis of video/audio feeds Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 13 / 31
Distributed Autonomic Applications Abstractions for Distributed Systems and Applications Abstractions for Distributed Computing Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 14 / 31
Distributed Autonomic Applications Abstractions for Distributed Systems and Applications Abstractions for Distributed Computing - 2 RE Application RE-Manager SAGA Based Glide-In BigJob Abstraction Framework SAGA File SAGA CPR/Migol SAGA Advert SAGA Implementation/ Adaptors (Migol, Globus) big-job big-job SAGA Advert SAGA Advert Replica-Agent Replica-Agent SAGA CPR/Migol SAGA CPR/Migol Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica sub-job sub-job sub-job sub-job sub-job sub-job sub-job sub-job Resource 1 Resource 2 Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 15 / 31
Distributed Autonomic Applications Ensemble Kalman Filters EnKF Ensemble Kalman filters: Recursive filters to handle large noisy data – data can be the results and parameters of ensembles of models Variation in the required run time of different models – which impacts overall results – each model run needs to converge before the next stage can begin Stage 1 Stage 2 Stage 3 1 1 1 2 2 2 M G KF KF 3 3 3 e n 4 4 4 . . . . . . . . . n n n Shantenu Jha (LSU and eSI) Grid Observatory Sep. 21 EGEE’09 Barcelona 16 / 31
Recommend
More recommend