Survey on Improved Scheduling in Hadoop MapReduce in Cloud - PDF document

International Journal of Computer Applications (0975 – 8887) Volume 34 – No.9, November 2011 Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments B.Thirumala Rao Dr. L.S.S.Reddy Associate Professor Professor & Director Dept. of CSE Dept. of CSE Lakireddy Bali Reddy College of Engineering Lakireddy Bali Reddy College of Engineering ABSTRACT networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management Cloud Computing is emerging as a new computational paradigm effort or service provider interaction ” . shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity Cloud computing concept is motivated by latest data demands as hardware clusters such as Clouds. In all Hadoop the data stored on web is increasing drastically in recent times. implementations, the default FIFO scheduler is available where The computing resources (e.g. servers, storage and services) in a jobs are scheduled in FIFO order with support for other priority cloud can automatically be scaled up to meet the dynamic based schedulers also. In this paper we study various scheduler demands of users by its virtualization and distributed system improvements possible with Hadoop and also provided some technology. In addition to that, it also provides redundancy and guidelines on how to improve the scheduling in Hadoop in backup features to overcome the hardware failure problems. In Cloud Environments. cloud environments data processing has become an important research problem. As cloud is a proper distributed system Keywords platform, parallel programming model like MapReduce [4] is Cloud Computing, Hadoop, HDFS, MapReduce widely used for developing scalable and fault tolerant applications deployable on cloud. Rest of the paper is organized 1. INTRODUCTION as follows: In section 2 Hadoop is summarized and various Cloud computing [1] refers to the use of shared computing current schedulers are discussed in section 3. Hadoop scheduler resources to deliver computing as a utility, and serves as an improvements are discussed in section 4. Finally we conclude alternative to having local servers handle computation. Cloud with discussion of future work in section 5. computing groups together large numbers of commodity hardware servers and other resources to offer their combined 2. HADOOP capacity on an on-demand, pay-as-you-go basis. The users of a Hadoop has been successfully used by many companies cloud have no idea where the servers are physically located and including AOL, Amazon, Facebook, Yahoo and New York can start working with their applications. This is the primary Times for running their applications on clusters. For example, advantage of cloud computing which distinguishes it from grid AOL used it for running an application that analyzes the or utility computing. The primary concept behind Cloud behavioral pattern of their users so as to offer targeted services. Computing isn't a new idea. John McCarthy within the sixties Apache Hadoop [3] is an open source implementation of the imagined that “processing amenities is going to be supplied to Google’s MapReduce [4] parallel processing framework. everyone just like a utility”. The word “cloud” has already been Hadoop hides the details of parallel processing, including data utilized in numerous contexts such as explaining big ATM distribution to processing nodes, restarting failed subtasks, and systems within the 1990s. Nevertheless, it had been following consolidation of results after computation. This framework Google’s BOSS Eric Schmidt utilized the term to explain the allows developers to write parallel processing programs that company type of supplying providers over the Web within 2006. focus on their computation problem, rather than parallelization Since then, the term “cloud computing” has been used mainly as issues. Hadoop includes 1) Hadoop Distributed File System a marketing term. Lack of a standard definition of cloud (HDFS): a distributed file system that store large amount of data computing has generated a fair amount of uncertainty and with high throughput access to data on clusters and 2) Hadoop confusion. For this reason, significant work has been done on MapReduce: a software framework for distributed processing of standardizing the definition of cloud computing. There are over data on clusters. 20 different definitions from a variety of sources. In this paper, we adopt the definition of cloud computing provided by The National Institute of Standards and Technology (NIST), as it covers, in our Opinion, all the essential aspects of cloud computing: NIST definition of cloud computing[2]: “ Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., 29

Survey on Improved Scheduling in Hadoop MapReduce in Cloud - PDF document

International Journal of Computer Applications (0975 8887) Volume 34 No.9, November 2011 Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments B.Thirumala Rao Dr. L.S.S.Reddy Associate Professor Professor &

Hadoop Scheduling A Hadoop job consists of Map tasks and Reduce tasks Only one job in

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Computer Science 00110001001110010011011000110111 Scheduling Hadoop Jobs to Meet Deadlines

Dynamic Proportional Share Scheduling in Hadoop Thomas Sandholm and Kevin Lai Social Computing

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Towards Improved Cloud Function Scheduling in Function-as-a-Service Platforms Student: Edwin F.

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Dynamic Hadoop clusters on HPC scheduling systems Michele Muggiri, Luca Pireddu*, Simone Leo,

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Incoop: MapReduce for Incremental Computations by Bhatotia et al What is Incoop? Hadoop

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

CPU Scheduling - I Different Scheduling Algorithms FCFS SJF Priority RR

Scheduling OS Lecture 11 UdS/TUKL WS 2015 MPI-SWS 1 Scheduling What is scheduling and

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

CPU Scheduling Questions Why is scheduling needed? CSCI [4|6] 730 What is

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,