Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica Systems for Resource Management Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica The reference Big Data stack High-level Interfaces Support / Integration Data Processing Data Storage Resource Management Valeria Cardellini - SABD 2019/2020 1
Outline • Cluster management system – Mesos • Resource management policy – DRF Valeria Cardellini - SABD 2019/2020 2 Motivations • Rapid innovation • No single framework optimal for all Big Data applications • Running each framework on its dedicated cluster: – Expensive – Hard to share data Valeria Cardellini - SABD 2019/2020 3
A possible solution • Run multiple frameworks on a single cluster • How to share the (virtual) cluster resources among multiple and non homogeneous frameworks executed in virtual machines/containers? • The classical solution: Static partitioning • Efficient? Valeria Cardellini - SABD 2019/2020 4 What we need • “The datacenter is the computer” (D. Patterson) – Share resources to maximize their utilization – Share data among frameworks – Provide a unified API to the outside – Hide the internal complexity of the infrastructure from applications • The solution: A cluster-scale resource manager that employs dynamic partitioning Valeria Cardellini - SABD 2019/2020 5
Apache Mesos • Cluster manager that provides a common resource sharing layer over which diverse frameworks can run “Program against your datacenter like it’s a single pool of resources” ⎼ Abstracts the entire datacenter into a single pool of computing resources, simplifying running distributed systems at scale ⎼ Distributed system to build and run fault-tolerant and elastic distributed systems on top of it Dynamic partitioning Valeria Cardellini - SABD 2019/2020 6 Apache Mesos • Designed and developed at Berkeley Univ. - Top open-source project by Apache mesos.apache.org • Used by Twitter, Uber, Apple (Siri) among the others • Cluster: a dynamically shared pool of resources Static partitioning Dynamic partitioning Valeria Cardellini - SABD 2019/2020 7
Mesos goals • High utilization of resources • Support for diverse frameworks (current and future) • Scalability to 10,000's of nodes • Reliability in face of failures Valeria Cardellini - SABD 2019/2020 8 Mesos in the data center • Where does Mesos fit as an abstraction layer in the datacenter? Valeria Cardellini - SABD 2019/2020 9
Computation model • A framework (e.g., Hadoop, Spark) manages and runs one or more jobs • A job consists of one or more tasks • A task (e.g., map, filter) consists of one or more processes running on same machine Valeria Cardellini - SABD 2019/2020 10 What Mesos does • Enables fine-grained resource sharing (at the level of tasks within a job) of resources (CPU, RAM, …) across frameworks • Provides common functionalities: - Failure detection - Task distribution - Task starting - Task monitoring - Task killing - Task cleanup Valeria Cardellini - SABD 2019/2020 11
Fine-grained sharing • Allocation at the level of tasks within a job • Improves utilization, latency, and data locality Coarse-grain sharing Fine-grain sharing Valeria Cardellini - SABD 2019/2020 12 Frameworks on Mesos • Frameworks must be aware of running on Mesos – DevOps tooling: Vamp • Deployment and workflow tool for container orchestration – Long running services: Aurora (service scheduler), … – Big Data processing: Hadoop, Flink, Spark, Storm, … – Batch scheduling: Chronos, … – Data storage: Alluxio, Cassandra, ElasticSearch, … – Machine learning: TFMesos • Framework to help running distributed Tensorflow ML tasks on Apache Mesos with GPU support Full list at mesos.apache.org/documentation/latest/frameworks/ Valeria Cardellini - SABD 2019/2020 13
Mesos: architecture • Master-worker architecture • Workers publish available resources to master • Master sends resource offers to frameworks • Master election and service discovery via ZooKeeper Source: Mesos: a platform for fine-grained resource sharing in the data center, NSDI'11 Valeria Cardellini - SABD 2019/2020 14 Mesos component: Apache ZooKeeper • Coordination service for maintaining configuration information, naming, providing distributed synchronization, and providing group services • Used in many distributed systems, among which Mesos, Storm and Kafka • Allows distributed processes to coordinate with each other through a shared hierarchical name space of data ( znodes ) – File-system-like API – Name space similar to a standard file system – Limited amount of data in znodes – Not really: file system, database, key-value store, lock service • Provides high throughput, low latency, highly available, strictly ordered access to the znodes Valeria Cardellini - SABD 2019/2020 15
Mesos component: ZooKeeper • Replicated over a set of machines that maintain an in-memory image of the data tree – Read requests processed locally by the ZooKeeper server – Write requests forwarded to other ZooKeeper servers and consensus before a response is generated (primary-backup system) – Uses Paxos as leader election protocol to determine which server is the master • Implements atomic broadcast – Processes deliver the same messages (agreement) and deliver them in the same order (total order) – Message = state update Valeria Cardellini - SABD 2019/2020 16 Mesos and framework components • Mesos components - Master - Workers or agents • Framework components - Scheduler : registers with master to be offered resources - Executors : launched on agents to run the framework’s tasks Valeria Cardellini - SABD 2019/2020 17
Scheduling in Mesos • Scheduling mechanism based on resource offers - Mesos offers available resources to frameworks • Each resource offer contains a list of <agent ID, resource1: amount1, resource2: amount2, ...> - Each framework chooses which resources to use and which tasks to launch • Two-level scheduler architecture - Mesos delegates the actual scheduling of tasks to frameworks - Why? To improve scalability • Master does not have to know the scheduling intricacies of every type of supported application Valeria Cardellini - SABD 2019/2020 18 Mesos: resource offers • Resource allocation is based on Dominant Resource Fairness (DRF) algorithm Valeria Cardellini - SABD 2019/2020 19
Mesos: resource offers in details • Workers continuously send status updates about resources to master Valeria Cardellini - SABD 2019/2020 20 Mesos: resource offers in details (2) Valeria Cardellini - SABD 2019/2020 21
Mesos: resource offers in details (3) • Framework scheduler can reject offers Valeria Cardellini - SABD 2019/2020 22 Mesos: resource offers in details (4) • Framework scheduler selects resources and provides tasks • Master sends tasks to workers Valeria Cardellini - SABD 2019/2020 23
Mesos: resource offers in details (5) • Framework executors launch tasks Valeria Cardellini - SABD 2019/2020 24 Mesos: resource offers in details (6) Valeria Cardellini - SABD 2019/2020 25
Mesos: resource offers in details (7) Valeria Cardellini - SABD 2019/2020 26 Mesos fault tolerance • Task failure • Worker failure • Host or network failure • Master failure • Framework scheduler failure Valeria Cardellini - SABD 2019/2020 27
Fault tolerance: task failure Valeria Cardellini - SABD 2019/2020 28 Fault tolerance: task failure (2) Valeria Cardellini - SABD 2019/2020 29
Fault tolerance: worker failure Valeria Cardellini - SABD 2019/2020 30 Fault tolerance: worker failure (2) Valeria Cardellini - SABD 2019/2020 31
Fault tolerance: host or network failure Valeria Cardellini - SABD 2019/2020 32 Fault tolerance: host or network failure (2) Valeria Cardellini - SABD 2019/2020 33
Fault tolerance: host or network failure (3) Valeria Cardellini - SABD 2019/2020 34 Fault tolerance: master failure Valeria Cardellini - SABD 2019/2020 35
Fault tolerance: master failure (2) • When the leading master fails, the surviving masters use ZooKeeper to elect a new leader Valeria Cardellini - SABD 2019/2020 36 Fault tolerance: master failure (3) • The workers and frameworks use ZooKeeper to detect the new leader and reregister Valeria Cardellini - SABD 2019/2020 37
Fault tolerance: framework scheduler failure Valeria Cardellini - SABD 2019/2020 38 Fault tolerance: framework scheduler failure (2) • When a framework scheduler fails, another instance can reregister to the master without interrupting any of the running tasks Valeria Cardellini - SABD 2019/2020 39
Fault tolerance: framework scheduler failure (3) Valeria Cardellini - SABD 2019/2020 40 Fault tolerance: framework scheduler failure (4) Valeria Cardellini - SABD 2019/2020 41
Recommend
More recommend