1
2
Flood is one of natural disasters having the most impact in Vietnam. It can make a long term effect that people have to face damages within minutes or hours. Besides the flood inundation, that is the salinization in Mekong Delta, Vietnam. Over 2,300 hectares of sugarcane plantation has sustained damage, accounting for over 30 percent of the total crop. 3
Based on the theories of physics, the flood flow or salinization of the sea water can be treaded as a kind of fluid flows with free surface. We can simulate the water deep or the move of flood by the variant of the height of the free surface. TELEMAC is an open source software that is package and based on finite element method (FEM) to implement these large scale simulations as mentioned above. 4
The requirement of government is solutions to estimate the effect of flood and salinization. To deal with these problems, the environment team need to simulate a large of situations which can happen. But it normally takes over 15 hours when they run the simulations sequentially and it is not impossible if we run multiple tasks on physical machine because of isolation. Therefore, we need a computing platform such that support multiple users simultaneously and reduce the execution time as short as possible. 5
There are many virtualization technologies recently. One of them is virtual machine based on hypervisor-based virtualization. Moreover, we have a new platform based on the lightweight virtualization technology called Docker. The key differences between these two platforms are the architecture and the operation mechanism. 6
1. Virtual machines include the application, the necessary binaries and libraries, and an entire guest operating system -- all of which can amount to tens of GBs. 2. Containers include the application and all of its dependencies --but share the kernel with other containers, running as isolated processes in user space on the host operating system. Docker containers are not tied to any specific infrastructure: they run on any computer, on any infrastructure, and in any cloud. 7
Briefly, Docker can provide a computing environment for all of applications because it is like virtual machine. Docker container also has an isolated space for runtime. However, they can be faced with the drawback that we can draw from its architecture, that is the conflict of sharing resources with host or other containers. This can be occurred completely. 8
When deploying environmental tools on Docker, we propose a model including two main parts: 1. Container is considered as a passive part like running environment 2. Host is a active part, containing code, data, libs needed for runtime With the architecture of Docker, we have benefits of developing computing platforms when reducing the overhead 9
10
We argue the meaning of parallelization and speed-up among Docker container, VM and PM in using TELEMAC for simulations. We deploy two VMs and Docker containers on each compute node. TELEMAC runs a problem that simulates a general flow with 100,000 elements in parallel mode (it is roughly 4000 km2 from the real world, this is a small test of simulating salinization in Mekong delta). Docker obtains the performance that is slightly adjacent to the PM, while VM has a higher level of deviation. Furthermore, this test-case shows that the more cores the execution of HPC applications has, not the more efficiency the system achieves. The drop in the speedup between 30 and 32 cores is interesting - is this a limit of TELEMAC observed elsewhere? : this is a limit of TELEMAC when running this problem. When we simulate the other simulation problems, may be the best number of cores can be changed. The execution time actually goes up and gradually toward a constant when we use more cores. 11
The next one is the evaluation of PM, VM, Docker along with TELEMAC simulating different problem sizes from the small scale to the large scale. In this case, we concern the changes of performance and overhead as well when running a range of different problem sizes. All of runs use 32 cores. 12
This scenario is the utilization of multiple users with different simulation modules on our system. we set up a series of simulation modules based on the sample examples of TELEMAC system. The scenario generates the increasing number of VM/Docker instances, and each instance is considered as a user running TELEMAC. One instance performs a specific simulation module different from each other. The cost of managing VMs can rise sharply up with the increasing amount because the hypervisor still causes the extra overhead for each workspace of VM without running jobs. Otherwise, the sharing architecture of Docker has more benefits in this case because of no extra overhead and they can share resources when not all of instances occupy the resources continuously in a long time. 13
We evaluate the performance and efficiency between Docker and VM with the real data of environmental simulations. The existing problems are flood and salinization. The execution time of two issues mentioned above on PM, VM and Docker. The figures of PM are the standard to evaluate the efficiency of Docker and VM. It is clear that Docker takes the shorter time to run the simulations in comparison to VM. 14
This figure shows the results of simulating salinization and flood inundation in Vietnam by using TELEMAC2D. From these results, solutions can be proposed to deal with these problems. The figure (left side): The option 1 and option 2 are the best places for flood barriers because the level of water is higher then the standard point. So, the Z-scale here is the height from sea-level. The figure (right side): the salinization-level being 4 g/l can cause crop failure, exactly the paddy field. It depends on the amount of salt deposited on the ground by earlier surges 15
This case highlights that if we run simultaneously all of jobs, the conflict of using resources among containers can occur. By default, if we do not configure any restriction for the allocation of resources, the mechanism controlling Docker containers is time-sharing. Each container only occupies the CPU in 100ms, and then switching context to each other. At that time, all of containers can share some of the same processors, although we still have the processors which are not used. We just test Docker with the bigger time-slices such as 0.5s, 1s, 2s. But the results are also not better. We also refer to the ISC paper [2016 - Resource Management for Running HPC Applications in Container Clouds ], their mechanism based on serializing applications in containers is efficient when there is an oversubscription of CPU resources and applications. The optimal time-slice that their paper shown is 10-15s. This is not really possible with our situation, the bigger time-slice does not ensure that the response-time and the correctness of simulations. 16
the scheduling algorithm actually keep track of CPUs that will have lighter loads and picking them for the execution (using --cpuset, --cpu-share…). And we are using PBS pro to implement the dynamic scheduling algorithm. 17
18
Recommend
More recommend