Easy Deployment for Jungle Computing Niels Drost Computer Systems Group Department of Computer Science VU University, Amsterdam, The Netherlands
Requirements ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● Globally unique naming ● System-support for malleability and fault-tolerance ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL, CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 2
Requirements ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● Globally unique naming ● System-support for malleability and fault-tolerance ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL, CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 3
Deployment ● How to get your application running in the Jungle ● For each resource used: ● Find resource ● Reserve resource ● Copy input files (and possibly application itself) ● Configure/Compile application ● Run application ● Copy back output files ComplexHPC Spring School 2011 4
Middleware ● Resources invariable use some sort of Middleware ● Provide remote access to resources ● File copy, running applications, etc ● Many different middleware available: ● Globus (de facto standard, in 4 Flavors) ● gLite, NAREGI, UNICORE, Legion ● SSH (poor man’s middleware) ComplexHPC Spring School 2011 5
Problems (1): Too little Middleware ● All resources need to have some middleware ● Hard to install ● Hard to maintain ● Low Fault-Tolerance ● Assume very static setup A full fledged middleware on a resource may require an almost full-time maintainer ComplexHPC Spring School 2011 6
Problems (2): Too much Middleware ● Jungle computing applications use multiple different resources ● With different middleware ● With wildly different interfaces ● Which are too low level Using multiple different resources at the same time is neigh impossible using middleware directly ComplexHPC Spring School 2011 7
Problems (3): Too much everything ● Large number of steps required to deploy an application ● Middleware level interface too low level for users ● Deploying an application requires the user to write another application! ● Users want to simply “press a button” to deploy Deployment is not very user friendly ComplexHPC Spring School 2011 8
Ibis Software Stack 3 2 1 ComplexHPC Spring School 2011 9
Zorilla: A P2P Middleware ComplexHPC Spring School 2011 10
Current middleware ● Hard to install and maintain ● Centralized implementation (not very fault -tolerant) ● Usually no global functionality ● No global file system ● No co-allocation (though Koala could also fix this) ● Not even possible unless exactly the same middleware everywhere ComplexHPC Spring School 2011 11
Zorilla ● Alternative middleware developed at the VU ● Based on Peer-to-Peer (P2P) technology ● Little to no configuration ● Highly fault-tolerant ● Trust issues ● Hardly any requirements (JVM) ● Easy to install, little to no maintenance ● Explicitly supports Jungle computing applications ● Plays nice with existing middleware ● Prototype ComplexHPC Spring School 2011 12
Life of a Job (1) ComplexHPC Spring School 2011 13
Life of a Job (2) ComplexHPC Spring School 2011 14
Life of a Job (3) ComplexHPC Spring School 2011 15
Life of a Job (4) ComplexHPC Spring School 2011 16
Zorilla Overview Clouds ComplexHPC Spring School 2011 17
Zorilla Components (1) ● Bootstrap ● Initial set of contact points ● UDP broadcast or provided by user ● Gossip overlay network ● Actualized Robust Random Gossip (ARRG) ● Withstands Firewalls et al. ● Clustering ● Nearest neighbor list ComplexHPC Spring School 2011 18
Zorilla Components (2) ● Flood scheduling ● Incrementally search for resources at more and more distant nodes ● Job Management ● Status (scheduling, running, done, etc) ● File transfers ● Malleability / crashes ComplexHPC Spring School 2011 19
Resource Discovery: ARRG ComplexHPC Spring School 2011 20
Resource Discovery: Clustering ComplexHPC Spring School 2011 21
Resource Discovery: Flood scheduling ComplexHPC Spring School 2011 22
Conclusions ● Current Middleware are hard to install and maintain. ● …and do not offer the global functionality required by Jungle Computing applications ● Zorilla is a light-weight P2P alternative, offering zero maintenance, easy install, and explicit support for parallel applications. ComplexHPC Spring School 2011 23
JavaGAT: Middleware independent API ComplexHPC Spring School 2011 24
Requirements ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● Globally unique naming ● System-support for malleability and fault-tolerance ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL, CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 25
Requirements ● Resource independence ● Transparent / easy deployment ● Middleware independence & interoperability ● Jungle-aware middleware ● Jungle-aware communication ● Robust connectivity ● Globally unique naming ● System-support for malleability and fault-tolerance ● Transparent parallelism & application-level fault-tolerance ● Easy integration with external software ● MPI, OpenCL, CUDA, C, C++, scripts, … ComplexHPC Spring School 2011 26
Typical Grid/Cloud Application Application submitJob(...) File.copy(...)
Typical Grid/Cloud Application Application submitJob(...) File.copy(...) fork cp pbs ftp condor gridftp unicore scp globus http
Typical Grid/Cloud Application Application submitJob(...) File.copy(...) ? fork ? cp pbs ftp condor gridftp unicore scp globus http
Which Middleware do I use? ● A lot to choose from ● Some may not work on all sites ● Most are hard to use ● Interfaces change often ● Globus? (Obvious choice 3 years ago) ComplexHPC Spring School 2011 30
Recommend
More recommend