Time within Distributed Systems Time is important, however, it is problematic in distributed systems as we cannot synchronize time perfectly
Introducing Time ● Time is a quantity that we often want to measure accurately ● Algorithms that depend upon clock synchronization have been developed in a lot of areas (not just within the distributed systems arena) ● Physical time is problematic within distributed systems (for lots of reasons)
Clocks, Events and Process States ● We define an event to be the occurrence of a single action that a process carries out as it executes ● An event is a communication action or a state- transforming action ● Clocks - every computer has one, and it can be used to timestamp any event
Clock Skew ● Computer clocks are like all other clocks in that they tend not to be in perfect agreement ● Skew or Clock Drift is a factor ● For ordinary clocks based on a quartz crystal, clock drift is about 10 -6 seconds/second - giving a difference of 1 second every 1,000,000 seconds (or 11.6 days) ● The drift rate of a "high precision" quartz clock is about 10 -7 or 10 -8 seconds/second
CTU (UTC) ● CTU stands for Coordinated Universal Time and is set from atomic clocks (which have a drift rate of one part in 10 +13 ) ● CTU (which is actually abbreviated as UTC) is an international standard for timekeeping ● Timing signals can be broadcast via radio signals (set to UTC devices) as can satellite GPS systems ● Computers with the appropriate (and expensive) receivers attached can synchronize their clocks with UTC
Synchronizing Physical Clocks ● External synchronization - setting the time to some external source of time ● Internal synchronization - setting the time based on "local agreement" (local time) ● In a synchronous distributed system, bounds are known for the drift rate of clocks, the maximum transmission delay is known, and the time to execute each processing step is set - so, synchronizing clocks is "easier" ● Unfortunately, most distributed systems are asynchronous
Cristian's Synchronizing Clocks ● Cristian suggested the use of a time server, connected to a device that receives signals from a source of UTC
More on Cristian's Algorithm ● Basic idea: Getting the current time from a “time server”, using periodic client requests ● Major problem – what happens if the time from the time server is less than the client – resulting in time running backwards on the client! (Which cannot happen – time does not go backwards ) ● Minor problem results from the delay introduced by the network request/response: latency
Discussing Cristian's Algorithm ● Single point of failure (if only one server used) ● The time server may fail and thus render synchronization temporarily impossible ● Solution : a group of synchronized time servers can be configured to which clients multicast requests ● Research showed that if F is the number of faulty server clocks out of a total of N servers, then we must have N > 3F if the other, correct, clocks are still to be able to achieve agreement
The Berkeley Algorithm ● A coordinator is chosen to act as the "master" clock ● The master periodically polls the other computers (the "slaves") to determine their local time ● An average time is then calculated by the master and distributed to the slaves to allow them to adjust their clocks to the "correct time"
Berkeley in Action Clocks running fast slow down (so that the other can catch up), clocks running slow skip forward to the correct time
Discussing Berkeley's Algorithm ● Faulty clocks can be dealt with due to the master's ability to take a "fault-tolerant average" - a subset of clocks is chosen that do not differ from one another by more than a specified amount, and the average is taken of the time readings from only these clocks ● An experiment involving 15 computers showed that Berkeley could synchronize clocks to within 20-25 milliseconds ● If the master suffers a failure, protocols exist to elect a predecessor (that is, a new master)
The Network Time Protocol (NTP) Defines an architecture for a time service and a protocol to distribute time information over the Internet
NTP Design Goals ● To provide a service enabling clients across the Internet to be synchronized accurately to UTC ● To provide a reliable service that can survive lengthy losses of connectivity ● To enable clients to resynchronize sufficiently frequently to offset the rates of drift found in most computers ● To provide protection against interference with the time service, whether malicious or accidental
How NTP Works ● Provides a network of servers located across the Internet ● Primary servers - attached to a UTC time source ● Secondary servers - connected to a primary for synchronization ● The servers are connected in a logical hierarchy called a "synchronization subnet", whose levels are called "strata" ● The synchronization subnet can reconfigure as servers become unreachable or failures occur ● Messages a delivered using UDP
Example Synchronization Subnet 1 2 2 3 3 3 Note: Arrows denote synchronization control, numbers denote strata.
NTP's Modes ● Multicast mode - used on high-speed LANs, requests are multicast to a collection of NTP servers, then clients set their clocks assuming a small network delay (achieving relatively low accuracies) ● Procedure-call mode - one computer accepts requests, replies with a timestamp, which is then used to update client clocks (higher accuracies achievable) ● Symmetric mode - intended to be used at strata level 1, where the highest accuracies are to be achieved; pairs of servers exchange timing messages bearing timing information, and this information is retained over time allowing the two servers to very closely synchronize their clocks
Logical Clocks ● Synchronization is based on “relative time”. ● Note that (with this mechanism) there is no requirement for “relative time” to have any relation to the “real time”. ● What’s important is that the processes in the Distributed System agree on the ordering in which certain events occur . ● Such “clocks” are referred to as Logical Clocks .
Lamport’s Logical Clocks ● First point : if two processes do not interact, then their clocks do not need to be synchronized – they can operate concurrently without fear of interfering with each other ● Second (critical) point : it does not matter that two processes share a common notion of what the “real” current time is. What does matter is that the processes have some agreement on the order in which certain events occur ● Lamport used these two observations to define the “happens-before” relation (also often referred to within the context of Lamport’s Timestamps)
The Happens-Before Relation, 1 of 4 ● If A and B are events in the same process, and A occurs before B, then we can state that: A “happens-before” B is true ● Equally, if A is the event of a message being sent by one process, and B is the event of the same message being received by another process, then A “happens-before” B is also true ● Note that a message cannot be received before it is sent, since it takes a finite, nonzero amount of time to arrive … and, of course, time is not allowed to run backwards
The Happens-Before Relation, 2 of 4 ● Obviously, if A “happens-before” B and B “happens-before” C, then it follows that A “happens-before” C ● If the “happens-before” relation holds, deductions about the current clock “value” on each DS component can then be made ● It therefore follows that if C(A) is the time on A, then C(A) is less than C(B), and so on
The Happens-Before Relation, 3 of 4 ● Now, assume three processes are in a DS: A, B and C ● All have their own physical clocks (which are running at differing rates due to “clock skew”, etc.) ● A sends a message to B and includes a “timestamp” ● If this sending timestamp is less than the time of arrival at B, things are OK, as the “happens-before” relation still holds (i.e. A “happens-before” B is true) ● However, if the timestamp is more than the time of arrival at B, things are NOT OK (as A “happens-before” B is not true, and this cannot be as the receipt of a message has to occur after it was sent)
The Happens-Before Relation, 4 of 4 ● The question to ask is: How can some event that “happens- before” some other event possibly have occurred at a later time?? ● The answer is: it can’t! ● So, Lamport’s solution is to have the receiving process adjust its clock forward to one more than the sending timestamp value. This allows the “happens-before” relation to hold, and also keeps all the clocks running in a synchronized state. The clocks are all kept in sync relative to each other
Lamports Clocks in Action
Problem: Totally-Ordered Multicasting ● Updating a replicated database and leaving it in an inconsistent state: Update 1 adds 100 euro to an account, Update 2 calculates and adds 1% interest to the same account. Due to network delays, the updates may not happen in the correct order. Whoops!
Recommend
More recommend