N ETWORK E MBEDDED S YSTEMS 1. Introduction Nikolas Wageneder
O VERVIEW 18.04.2007 Preliminaries Distributed Computation Processes Communication Timing Assumptions Time 2
P RELIMINARIES 18.04.2007 Main abstractions Processes Communication links Failure detector 3
D ISTRIBUTED C OMPUTATION 18.04.2007 Process performs computations in a distributed system N uniquely processes p 1 , …, p N Processes know each other Every process runs the same algorithm Communication by uniquely identified messages through communication links 4
D ISTRIBUTED C OMPUTATION 18.04.2007 Distributed algorithm viewed as automata Execution is a sequence of steps executed by a process partial / infinite execution Global clock is assumed One step per clock tick Process step: Receive event May be nil Execution Send event 5 Process internal communication is not relevant
D ISTRIBUTED C OMPUTATION 18.04.2007 Communication step Sending a message to another process Receiving of the message Deterministic Algorithms assumed 6
D ISTRIBUTED C OMPUTATION 18.04.2007 Safety Property that can be violated at some time t and never again be satisfied „algorithm never does anything wrong“ Example: perfect links Liveness „eventually something good happens“ Meaningful perfect links -> message sent will eventually receive message Liveness violated if message will not arrive within infinite time Challenge is to guarantee both 7
A BSTRACTING P ROCESSES 18.04.2007 Arbitrary fault / Byzantine Failure Most general fault behaviour „Worst Case“ – most expensive to handle Only acceptable option in distributed systems Must not be malicious or intentional - simple bug enough Omission Fault Process does not send or receive messages Buffer overflow or network congestion Messages are dropped 8
A BSTRACTING P ROCESSES 18.04.2007 Crashes Kind of omission fault Process works correctly until time t Faulty processes crash Crash failure / crash-stop No more messages sent No more computations Crashed processes may recover and participate in further computations In future crash-stop is assumed 9
A BSTRACTING P ROCESSES 18.04.2007 Recoveries Crash recovery abstraction Process may crash and recover a finite number of times and will still be correct in this model When the process crashes: Process stops sending messages Omission fault Possible amnesia To “synchronise” a <recovery> event is started automatically <recovery> receives information from stable storage If crashing while atomic <init> restart <init> 10
A BSTRACTING P ROCESSES 18.04.2007 Save all steps in stable storage is expensive Goal of algorithm is to minimize access to stable storage When processors crash – reinit with different ID. not possible – set of processes would not be static any more 11
A BSTRACTING P ROCESSES 18.04.2007 Interface between software modules and crash recovery One module of a process sends a message or a decision to an upper layer module. While sending it crashes. After recovery the module cannot determine if the upper layer got the message/decision Solution 1: Change interface, so the lower module saves messages in stable storage exposed to the upper layer Solution2: Periodically deliver messages to upper layer until the latter explicitly asks for stopping. 12 Upper layer though needs to check for duplicates
A BSTRACTING C OMMUNICATION 18.04.2007 Link abstraction for network components Every two processes are connected by bidirectional links Different topologies possible Fully connected mesh Bus Ring Mesh of links with routers and bridges (i.e. Internet) Exchanged messages are uniquely identified and every receiver is able to identify the sender When request-reply messaging – process knows which reply is a response to which request message (timestamps) 13
A BSTRACTING C OMMUNICATION 18.04.2007 Link Failures Messages might be lost when transiting the network Three abstractions to serve a certain reliability Fair-loss links Stubborn links Perfect links Properties: Send deliver (more general than receive) 14
A BSTRACTING C OMMUNICATION 18.04.2007 Fair-loss links Consist of two events: - request event to send messages - Indication event to deliver messages Fair loss properties: FL1. If neither the sender nor the receiver crash and the message is sent continuously the message will eventually arrive FL2. Finite duplication ensures that the network does not more retransmissions than performed by the sender FL3. No creation says that the network itself does not create or corrupt messages 15
A BSTRACTING C OMMUNICATION 18.04.2007 Stubborn links This abstraction hides lower layer retransmission mechanisms used by the sender process Algorithm Retransmit Forever Implements stubborn link over a fair loss one Correctness Proof: Fair loss properties guarantee correctness when messages are sent infinitely often Performance: 16 Not very efficient. Show-case algorithm
A BSTRACTING C OMMUNICATION 18.04.2007 Perfect or reliable link Adds mechanisms for detection and suppression of message duplicates Request & indication event Properties: PL1. Reliable delivery says that if a process sends a message m to another process, and both processes do not crash the message is eventually delivered PL2. No duplication states that no message is delivered more than once PL3 . No creation means, that if a process receives a message m than it was previously sent by another 17 process
A BSTRACTING C OMMUNICATION 18.04.2007 Algorithm Eliminate Duplicates Implements perfect link over stubborn link Keeps record over all messages delivered in the past When a message is received it is only delivered if it is not a duplicate Correctness: If m is sent between non crashing processes based on the underlying stubborn delivery property, m is eventually delivered no duplication follows from the test performed by the algorithm no creation follows from the stubborn link Performance No duplicates improver performance though storing all ever sent messages is not possible improvement by sending “stop” messages they may not receive the sending process in time violation of no creation property 18 additional mechanisms (timestamps) necessary
A BSTRACTING C OMMUNICATION 18.04.2007 Logged perfect links Perfect links & eliminate duplicates not suitable for crash- recovery. Delivered variable would be lost in case of crashing Here the message is stored in a local log which can be read from an upper layer module in case of crashing Delivering here is the act of logging Algorithm Log Delivered: Correctness Like eliminate duplicates delivered / logged variable is stored in stable memory Performance Like eliminate duplicates log is written with every new 19 message
A BSTRACTING C OMMUNICATION 18.04.2007 Important facts Network topology awareness Many optimizations can be achieved if the network topology is exposed to upper layers Flow control Resources are limited, sender must be aware of receivers capacities. Otherwise messages are lost Heterogeneity awareness Not all processes run on the same hardware. Most demanding tasks should be assigned to the most powerful processors 20
T IMING A SSUMPTIONS 18.04.2007 Asynchronous System Not making any timing assumptions Logical time as incrementing counter in async. Systems Cause-effect relations can be captured Resulting in the “happened - before” relation e1 e2 t(e1) < t(e2) can be shown Consensus not possible 21
T IMING A SSUMPTIONS 18.04.2007 Synchronous System Properties Synchronous computation 1. Upper bound on processing delays Synchronous communication 2. Upper bound on message delays Synchronous physical clocks 3. Processes have local physical clocks Upper bound on how much this clock derivates from the global physical clock 22
T IMING A SSUMPTIONS 18.04.2007 Synchronous System Services Timed failure detection Detection of crashed processes within bounded time Measure of transit delays through delays it can be determined how far the other processor is away or if link is slow Coordination based on time lease is possible to have exclusive rights (i.e.file access) Worst-case performance worst-case response timed can be determined a process knows when his sent message arrives Synchronised clocks clocks are never apart more than a certain constant δ precision 23 events within δ cannot be ordered
T IMING A SSUMPTIONS 18.04.2007 Partial Synchrony Periods of time where a synchronous system behaves asynchronous Overloaded network Shortage of memory Buffer overflows (incoming messages) lost messages violating the upper bounds on delivery Assume that time of delay is not infinite, so the system becomes synchronous again 24
Recommend
More recommend