optimizing client side resource utilization in public
play

Optimizing Client-side Resource Utilization in Public Clouds - PDF document

Optimizing Client-side Resource Utilization in Public Clouds Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi, Michael Swift Department of Computer Sciences University of Wisconsin-Madison Madison, WI, USA { swapnilh, mihir, haseeb,


  1. Optimizing Client-side Resource Utilization in Public Clouds Swapnil Haria, Mihir Patil, Haseeb Tariq, Anup Rathi, Michael Swift Department of Computer Sciences University of Wisconsin-Madison Madison, WI, USA { swapnilh, mihir, haseeb, amrathi, swift } @cs.wisc.edu in fixed-sized chunks of compute, memory, and Abstract —Public clouds such as Amazon’s Elastic Cloud Compute (EC2) and the Google Compute En- I/O units (such as instances in the Amazon EC2 gine are being increasingly used by organizations as cloud). However, reasoning about the resource re- well as individuals to rapidly set up infrastructure to quirements of applications is tough, especially for deploy their applications. While these cloud providers front-end applications like web servers whose re- promise flexibility and performance to users, efficient quirements vary widely with the incoming load. and cost-effective management of Virtual Machine Real-world observations demonstrate up to 5x CPU (VM) clusters is difficult to achieve in practice. The and 2x memory resource over-provisioning for ap- coarse granularity of resource allocation and dynam- plications in analogous situations like submitting ically varying resource requirements of applications result in excess costs due to VM under-utilization jobs to a datacenter [11]. This benefits cloud op- or performance degradation on undersized VMs. erators, who use techniques like memory sharing The growing number of applications being moved and overcommitment as well as virtual CPU multi- to such cloud environments magnifies the scope of plexing to improve the utilization of their physical this problem. infrastructure [22]. We propose a Client-side Resource Manager which The Resource as a Service model [2] has been uses on-demand application migration and dynamic proposed to increase the granularity of resource migration policies to improve VM utilization, and allocation in the cloud environment, primarily to extract the best possible performance from a VM ensure greater value for the client’s money. While cluster for a given operating cost. Two migration policies are developed and evaluated using a new this helps avoid the cost of paying for unused simulator, created by us to rapidly validate migration resources, it is hard for developers and users to policies over longer time-frames. We demonstrate reason about increasingly finer amounts of re- improved cost-efficiency of 25% over the conventional sources. Other solutions include hybrid resource approach. provisioning strategies [8, 10] to optimize the cost I. I NTRODUCTION efficiency of cloud computing for the end user. Cloud computing is a promising new technology These are limited solutions as these tackle the problem of choosing between on-demand and re- aimed at helping clients decouple software services from hardware infrastructure. In April 2015, Ama- served resources for incoming jobs to minimize zon reported that its Amazon Web Services (AWS) running costs. However, VM utilization needs to be considered to guarantee cost efficiency, especially division had hit annual revenues of $5 billion, and is growing rapidly at about 49% each year [4]. for long-running VM clusters in the cloud. This spectacular rise of cloud platforms can be We believe that the problem is not in the IaaS model itself, but in the lack of control available to attributed to on-demand availability of hardware, support for various usage patterns and elimination the end users to optimize their purchased resources. of infrastructure management costs [6]. Our approach tackles this through the use of on- In the Infrastructure as a Service (IaaS) model demand application level migration. Application- dominant today, these end-users lease resources level migration or process migration as it used

  2. to be called, has proven to be tough to imple- and we outline future directions while concluding ment practically due to issues of residual depen- in Section IX. dencies and transparency [19]. Fortunately, cloud- II. B ACKGROUND based operating systems like Drawbridge [20] and A. Process Migration OSv [14] have increased process isolation and Process migration is the act of transferring a narrower process-kernel interactions, which results process along with its all data and state to another in better application mobility. These OSes had machine and resuming execution on that machine. been initially proposed to optimize the footprint of There are many applications of process migration operating systems in the cloud, by building on the such as load balancing in clusters, providing fault library OS model [5, 12]. As a result, our imple- resilience and exploiting data locality in NUMA mentation is aimed for usage in such environments, and other distributed systems [19]. Unfortunately, and leverages the improved application mobility to there are many challenges in adding support for facilitate efficient application-level migration. process migration. In this paper, we propose CSRM, a client-side The major issues in supporting application mi- resource manager for optimizing VM utilization gration are the complexity of the implementation and improving the cost-efficiency of cloud com- and handling the dependencies of an application puting for the end-user. CSRM uses on-demand with the operating system. The level at which the application-level migration to shift processes to support is added, kernel level or user space, has larger virtual machines (VMs) during peak demands consequences for the complexity, performance and (scale-up), or relocate processes to smaller VMs transparency of the migration. User space imple- during times of light load (scale-down). This avoids mentations are simpler, and have better knowledge the need to kill and restart processes on larger of the application’s behavior but can suffer from re- VMs on exceeding the estimated memory demand, duced transparency and performance. Transparency for which existing progress gets discarded. The requires that neither the migrated tasks nor any resource manager is also responsible for manag- other interacting tasks notice the effects of the ing the number and size of VMs (scaling out or migration. in) needed to adequately satisfy the performance Communications can be delayed but not dis- requirements of these jobs. rupted, and IO channels and file descriptors should Our contributions can be summarized as follows- be preserved across migrations. Transparency intro- • Discussion of the opportunities and challenges duces complexity, and there are trade-offs between in streamlining public cloud usage, transparency and performance as well. Particularly • Conception of the CSRM framework to im- for networked applications, redirecting communi- prove VM utilization and cost-efficiency, cation through old links after migration leads to • Development of a simulator to rapidly evaluate residual dependencies. The forwarding costs can get the various migration policies, significant with increasing number of migrations. • Description and evaluation of two migration Furthermore, different applications are impacted policies. differently by migration. Migration delays are tol- This paper is structured as follows. Section II erable for long running applications but could be describes the feasibility issues associated with pro- prohibitive for short running and latency critical cess migration, and explains how cloud OS environ- applications. As discussed in the next section, some ments are conducive for such migration. In Section of these issues are mitigated in library OSes. III, we discuss the challenges in efficiently manag- B. Cloud Operating Systems ing a VM cluster in a public cloud. The software architecture of CSRM is presented in Section IV, Currently, operating systems in cloud computing and the migration policies are detailed in Section environments can be categorized in two groups. V. Section VI describes our methodology, and the The majority of the virtual machines in the cloud various migration policies are evaluated in Section run existing general purpose operating systems (ei- VII. Section VIII is a summary of the related work, ther unmodified or slightly modified in the case

Recommend


More recommend