CO-ORDINATION WITH ZOOKEEPER PRESENTED BY: 1. PRATAP CHANDRA DAS 2. SHORAJ TOMER 3. SOUGATA BHATTACHARYA
CONTENT • Distributed Computing : A brief Introduction • Problems of manageability in distributed computing • Solution: Apache Zookeeper • What Zookeeper does? • A brief History • Framework of Zookeeper • Data Model and Hierarchical namespace of Zookeeper • Different modes for znodes • Zookeeper Quorums • Zookeeper Sessions, Requests and Transactions • Zab - ZooKeeper Atomic Broadcast protocol • Zookeeper Snapshots • Leader and Follower Prorocol • Projects which uses ZooKeeper • Tutorial – Installation, Setup, znode (creating, editing, deleting), subnode
DISTRIBUTED COMPUTING : INTRODUCTION network
INTRODUCTION A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.
WHY ‘DISTRIBUTED’? The word distributed means data being spread out over more than one computer in a network.
SINGLE VS DISTRIBUTED Single le Machine ne Distrib ibut uted d Applicat atio ion Complex Architecture Simple Architecture Huge tasks can be done within Takes hours to complete a huge, minutes complex task If one system crashes, other All the processes stop if the system systems keep running and may take crashes over the faulty process
A DISTRIBUTED APPLICATION Client Software Server Software
CLUSTER AND NODE Cluster: A group of systems Node: Each System in a cluster
ADVANTAGES Scalability: The system can easily be expanded by adding more machines as needed. Redundancy: Several machines can provide the same services, so if one is unavailable, work does not stop. Ease of development and maintenance Coordination of autonomous actions
DISADVANTAGES Complexity: More complex than centralized systems. Network reliance: Messages can be lost in the communication network. Security: More susceptible to external attacks. Multiple Point of Failure: Much more prone to error due to huge number of machines. Manageability: More effort is required for system management.
MAJOR ISSUES ON MANAGING DISTRIBUTED SYSTEMS Race Condition: Performing two or more operations at the same time. Deadlock: Two or more machines trying to access the same shared resources at the same time. Partial Failure of Process: Leads to inconsistency of data.
SOLUTION : ZOOKEEPER
SOLVING THE MANAGEABILITY ISSUES Race Condition: Serialization property of Zookeeper Deadlock: Synchronization property of Zookeeper Partial Failure of Process: Handled through atomicity
WHAT IS APACHE ZOOKEEPER? • Zookeeper is a centralized service for 1. Maintaining configuration information, 2. Naming, 3. Providing distributed synchronization and 4. Providing group services, for distributed applications. Zookeeper is a distributed, open-source coordination • service for distributed applications. It is also called as 'King of Coordination'
• It is centralized repository where distributed application can be put data and get data out of it. • Used to keep the distributed system functioning together as single unit, using its synchronization, Serialization and coordination goals. • It is Hadoop admin tool used for managing the jobs in the cluster. FORMA L DEFINITION: : It is a distributed open source Configuration, Synchronization service along with naming registry for distributed applications. It is used to manage and coordinate large cluster of machines.
DESIGN GOALS OF ZOOKEEPER • Must be able to tolerate failures • Must be able to recover from correlated recoverable failures (power outages) • Must be correct • Must be easy to implement correctly • Must be fast (high throughput, low latency)
WHY APACHE ZOOKEEPER? In the good old past, each application software was a single program running on a single computer with a single CPU. Today, things have changed. In the Big Data world, application software are made up of many independent programs running on an ever-changing set of computers. These applications are known as Distributed Application. A distributed application can run on multiple systems in a network simultaneously by coordinating among themselves to complete a particular task in a fast and efficient manner.
Coordinating the actions of the independent programs in a distributed systems is far more difficult than writing a single program to run on a single computer. It is easy for developers to get mired in coordination logic and lack the time to write their application logic properly or perhaps the converse, to spend little time with the coordination logic and simply to write a quick-and-dirty master coordinator that is fragile and becomes an unreliable single point of failure. Zookeeper is an important part of Hadoop that take care of these small but important issues so that developer can focus more on functionality of the application.
WHAT DOES A ZOOKEEPER DO? NAME ME SERVICE ICE:- Zoo-Keeper exposes a simple interface for Naming service which identifies the nodes in a cluster by name similar to DNS. LOCKING ING:- Zoo-Keeper provides for an easy way for you to implement distributed mutexes to allow for serialized access to a shared resource in your distributed system.
• CONFI NFIGU GURATION ION MANA NAGEM GEMEN ENT:- You can use Zoo-Keeper to centrally store and manage the configuration of your distributed system. This means that any new nodes joining will pick up the up-to-date centralized configuration from Zoo-Keeper as soon as they join the system. This also allows you to centrally change the state of your distributed system by changing the centralized configuration through one of the Zoo- Keeper clients. • LEADER ION :- Zoo-Keeper provides off-the-shelf support for leader election DER ELECT CTION which will deal with the problem of nodes going down. TION :- Hand in hand with distributed mutexes is the need for • SYNC NCHR HRON ONIZA IZATIO synchronizing access to shared resources. Whether implementing a producer-consumer queue or a barrier, Zoo-Keeper provides for a simple interface to implement that.
WORLD WITHOUT ZOOKEEPER • Previously distributed systems have implemented components like distributed lock managers or have used distributed databases for coordination. While it's possible to design and implement all of these services from • scratch, it's extra work and difficult to debug any problems, race conditions, or deadlocks. • There was a need that people shouldn't go around writing their own name services or leader election services from scratch every time they need it.
MOTIVATION BEHIND ZOOKEEPER Moreover, you could hack together a very simple group membership service relatively easily, but it would require much more work to write it to provide reliability, replication, and scalability. This led to the development and open sourcing of Apache Zoo-Keeper, an out-of-the box reliable, scalable, and high-performance coordination service for distributed systems.
Zoo-Keeper, in fact, borrows a number of concepts from these prior systems. It does • not expose a lock interface or a general purpose interface for storing data, however. The design of Zoo-Keeper is specialized and very focused on coordination tasks. • It is certainly possible to build distributed systems without using Zoo-Keeper. Zoo-Keeper, however, offers developers the possibility of focusing more on • application logic rather than on arcane distributed systems concepts. • Programming distributed systems without Zoo-Keeper is possible, but more difficult.
HISTORY The Origin of the Name “Zoo - Keeper” Zoo-Keeper was developed at Yahoo! Research. Yahoo had been working on Zoo- Keeper for a while and pitching it to other groups. At the time the Zoo-Keeper group had been working with the Hadoop team and had started a variety of projects with the names of animals, Apache Pig being the most well known. As the group started talking about different possible names, one of the group members mentioned that they should avoid another animal name because it started to sound like a zoo. That is when it clicked: distributed systems are a zoo. They are chaotic and hard to manage, and Zoo-Keeper is meant to keep them under control.
FRAMEWORK OF ZOOKEEPER It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C. Zoo-Keeper, while being a coordination service for distributed systems, is a distributed application on its own.
• It follows a simple client-server model where clients are nodes (i.e., machines) that make use of the service, and servers are nodes that provide the service. Applications make calls to Zoo-Keeper through a client library. The • client library is responsible for the interaction with Zoo-Keeper servers. Each client imports the client library, and then can communicate with any Zoo-Keeper node.
• Zoo-Keeper servers run in two modes: standalo dalone e and qu quor orum. • Standalone dalone mode de is pretty much what the term says: there is a single server, and Zoo-Keeper state is not replicated. • Quor orum m mode, a group of Zoo-Keeper servers, which we call a Zoo- Keeper ensemble, replicates the state, and together they serve client requests.
Recommend
More recommend