computer communication networks transport layer
play

Computer Communication Networks Transport Layer IECE / ICSI 416 - PowerPoint PPT Presentation

Computer Communication Networks Transport Layer IECE / ICSI 416 Spring 2020 Prof. Dola Saha 1 End-to-end Protocols Common properties that a transport protocol can be expected to provide Guarantees message delivery Delivers


  1. Reliable data transfer: getting started we’ll: § incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) § consider only unidirectional data transfer • but control info will flow on both directions! § use finite state machines (FSM) to specify sender, receiver event causing state transition actions taken on state transition state: when in this “ state ” state next state uniquely state 1 determined by next event 2 event actions 28

  2. rdt1.0: reliable transfer over a reliable channel § underlying channel perfectly reliable • no bit errors • no loss of packets § separate FSMs for sender, receiver: • sender sends data into underlying channel • receiver reads data from underlying channel rdt_send(data) rdt_rcv(packet) Wait for call Wait for call extract (packet,data) from above from below packet = make_pkt(data) deliver_data(data) udt_send(packet) sender receiver 29

  3. rdt2.0: channel with bit errors § underlying channel may flip bits in packet • checksum to detect bit errors § the question: how to recover from errors: • acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK • negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors How do humans recover from “ errors ” during conversation? • sender retransmits pkt on receipt of NAK § new mechanisms in rdt2.0 (beyond rdt1.0 ): • error detection • receiver feedback: control msgs (ACK,NAK) rcvr->sender 30

  4. rdt2.0: channel with bit errors § underlying channel may flip bits in packet • checksum to detect bit errors § the question: how to recover from errors: • acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK • negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors • sender retransmits pkt on receipt of NAK § new mechanisms in rdt2.0 (beyond rdt1.0 ): • error detection • feedback: control msgs (ACK,NAK) from receiver to sender 31

  5. rdt2.0: FSM specification rdt_send(data) receiver sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for rdt_rcv(rcvpkt) && Wait for ACK call from corrupt(rcvpkt) udt_send(sndpkt) or NAK above udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) L Wait for call from below sender rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 32

  6. rdt2.0: operation with no errors rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) rdt_rcv(rcvpkt) && Wait for call Wait for ACK corrupt(rcvpkt) udt_send(sndpkt) from above or NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for call L from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 33

  7. rdt2.0: error scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) rdt_rcv(rcvpkt) && Wait for Wait for corrupt(rcvpkt) udt_send(sndpkt) call from ACK or above NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for L call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 34

  8. rdt2.0 has a fatal flaw! what happens if ACK/NAK handling duplicates: corrupted? sender retransmits current pkt if § sender doesn’t know what happened at ACK/NAK corrupted § receiver! sender adds sequence number to each § can’t just retransmit: possible pkt § duplicate receiver discards (doesn’t deliver up) § duplicate pkt stop and wait sender sends one packet, then waits for receiver response 35

  9. rdt2.1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for ACK isNAK(rcvpkt) ) Wait for call or NAK 0 0 from udt_send(sndpkt) above rdt_rcv(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) && isACK(rcvpkt) L L Wait for Wait for ACK or call 1 from rdt_rcv(rcvpkt) && NAK 1 above ( corrupt(rcvpkt) || rdt_send(data) isNAK(rcvpkt) ) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) udt_send(sndpkt) 36

  10. rdt2.1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) udt_send(sndpkt) Wait for Wait for 0 from rdt_rcv(rcvpkt) && 1 from rdt_rcv(rcvpkt) && below below not corrupt(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt) has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) 37

  11. rdt2.1: discussion sender: receiver: § seq # added to pkt § must check if received packet is duplicate • state indicates whether 0 or 1 is § must check if received expected pkt seq # ACK/NAK corrupted § note: receiver can not know § twice as many states if its last ACK/NAK • state must “ remember ” whether received OK at sender “ expected ” pkt should have seq # of 0 or 1 38

  12. rdt2.2: a NAK-free protocol § same functionality as rdt2.1, using ACKs only § instead of NAK, receiver sends ACK for last pkt received OK • receiver must explicitly include seq # of pkt being ACKed § duplicate ACK at sender results in same action as NAK: retransmit current pkt 39

  13. rdt2.2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for isACK(rcvpkt,1) ) Wait for ACK 0 call 0 from udt_send(sndpkt) above sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || L Wait has_seq1(rcvpkt)) receiver FSM for fragment udt_send(sndpkt) 0 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt) 40

  14. rdt3.0: channels with errors and loss new assumption: underlying channel approach: sender waits “ reasonable ” can also lose packets (data, amount of time for ACK ACKs) Ø retransmits if no ACK received in Ø checksum, seq. #, ACKs, this time retransmissions will be of help … Ø if pkt (or ACK) just delayed (not but not enough lost): Ø retransmission will be duplicate, but seq. # ’ s already handles this Ø receiver must specify seq # of pkt being ACKed Ø requires countdown timer 41

  15. rdt3.0 sender rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,1) ) start_timer rdt_rcv(rcvpkt) L L Wait Wait for timeout for call 0from udt_send(sndpkt) ACK0 above start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt,1) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer Wait Wait for timeout for call 1 from udt_send(sndpkt) ACK1 above rdt_rcv(rcvpkt) start_timer L rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(1, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,0) ) start_timer L 42

  16. rdt3.0 in action sender receiver sender receiver send pkt0 send pkt0 pkt0 pkt0 rcv pkt0 rcv pkt0 send ack0 send ack0 ack0 ack rcv ack0 rcv ack0 0 pkt1 send pkt1 pkt1 send pkt1 X rcv pkt1 loss send ack1 ack1 rcv ack1 pkt0 send pkt0 timeout rcv pkt0 resend pkt1 send ack0 pkt1 ack0 rcv pkt1 send ack1 ack1 rcv ack1 pkt0 send pkt0 rcv pkt0 (a) no loss send ack0 ack0 (b) packet loss 43

  17. rdt3.0 in action sender receiver sender receiver send pkt0 send pkt0 pkt0 pkt0 rcv pkt0 rcv pkt0 send ack0 send ack0 ack ack rcv ack0 rcv ack0 0 0 send pkt1 pkt pkt send pkt1 rcv pkt1 rcv pkt1 1 1 ack1 send ack1 send ack1 X ack1 loss timeout timeout resend pkt1 resend pkt1 pkt1 pkt1 rcv pkt1 rcv pkt1 rcv ack1 (detect duplicate) (detect duplicate) pkt0 ack1 send ack1 send ack1 send pkt0 rcv ack1 ack1 rcv pkt0 rcv ack1 pkt0 send pkt0 ack0 send ack0 send pkt0 rcv pkt0 pkt0 rcv pkt0 send ack0 ack0 ack0 (detect duplicate) send ack0 (c) ACK loss (d) premature timeout/ delayed ACK 44

  18. Performance of rdt3.0 § rdt3.0 is correct, but performance stinks § e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit D trans = L 8000 bits packet: = 8 microsecs = R 10 9 bits/sec § U sender : utilization – fraction of time sender busy sending L / R . 008 U sender = = 0.00027 = 30.008 RTT + L / R § if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec throughput over 1 Gbps link § network protocol limits use of physical resources! 45

  19. rdt3.0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R L / R . 008 U sender = = 0.00027 = 30.008 RTT + L / R 46

  20. Pipelined protocols pipelining: sender allows multiple, “ in-flight ” , yet-to-be- acknowledged pkts • range of sequence numbers must be increased • buffering at sender and/or receiver two generic forms of pipelined protocols: go-Back-N, selective repeat 47

  21. Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R 3-packet pipelining increases utilization by a factor of 3! 3L / R . 0024 U sender = = 0.00081 = 30.008 RTT + L / R 48

  22. Pipelined protocols: overview Go-back-N: Selective Repeat: § sender can have up to N § sender can have up to N unacked packets in unack ’ ed packets in pipeline pipeline § receiver only sends § rcvr sends individual cumulative ack ack for each packet • doesn ’ t ack packet if there ’ s a gap § sender maintains § sender has a timer for multiple timers , one for oldest unacked packet each unacked packet • when timer expires, retransmit all • when timer expires, retransmit unacked packets only that unacked packet 49

  23. Go-Back-N: sender k-bit seq # in pkt header § “ window ” of up to N, consecutive unack ’ ed pkts allowed § § ACK(n): ACKs all pkts up to, including seq # n - “ cumulative ACK ” • may receive duplicate ACKs (see receiver) § timer for oldest in-flight pkt § timeout(n): retransmit packet n and all higher seq # pkts in window 50

  24. Go-Back-N (events and actions) receiver sender pkt n contains expectedSequenceNo data from above: § send ACK(n) if the window is not full, packet is § pkt n does not contain created and sent expectedSequenceNo timeout(n): § ACK(n) § out-of-order: buffer resends all packets that have been § sent but not yet been acknowledged Received ACK(n): mark all pkts up to n as received § 51

  25. GBN: sender extended FSM rdt_send(data) if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else L refuse_data(data) base=1 nextseqnum=1 timeout start_timer Wait udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) rdt_rcv(rcvpkt) … && corrupt(rcvpkt) udt_send(sndpkt[nextseqnum-1]) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stop_timer else start_timer 52

  26. GBN: receiver extended FSM default udt_send(sndpkt) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum) L Wait extract(rcvpkt,data) expectedseqnum=1 deliver_data(data) sndpkt = sndpkt = make_pkt(expectedseqnum,ACK,chksum) make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in- order seq # • may generate duplicate ACKs • need to only remember expectedseqnum out-of-order pkt: § • discard (don ’ t buffer): no receiver buffering! • re-ACK pkt with highest in-order seq # 53

  27. GBN in action sender receiver sender window (N=4) send pkt0 0 1 2 3 4 5 6 7 8 send pkt1 0 1 2 3 4 5 6 7 8 receive pkt0, send ack0 send pkt2 0 1 2 3 4 5 6 7 8 receive pkt1, send ack1 X loss send pkt3 0 1 2 3 4 5 6 7 8 (wait) receive pkt3, discard, (re)send ack1 rcv ack0, send pkt4 0 1 2 3 4 5 6 7 8 rcv ack1, send pkt5 0 1 2 3 4 5 6 7 8 receive pkt4, discard, (re)send ack1 ignore duplicate ACK (ack1) receive pkt5, discard, (re)send ack1 pkt 2 timeout send pkt2 0 1 2 3 4 5 6 7 8 send pkt3 0 1 2 3 4 5 6 7 8 rcv pkt2, deliver, send ack2 send pkt4 0 1 2 3 4 5 6 7 8 rcv pkt3, deliver, send ack3 send pkt5 0 1 2 3 4 5 6 7 8 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5 54

  28. Selective repeat § receiver individually acknowledges all correctly received pkts • buffers pkts, as needed, for eventual in-order delivery to upper layer § sender only resends pkts for which ACK not received • sender timer for each unACKed pkt § sender window • N consecutive seq # ’ s • limits seq #s of sent, unACKed pkts 55

  29. Selective repeat: sender, receiver windows 56

  30. Selective repeat (events and actions) receiver sender pkt n in [rcvbase, rcvbase+N-1] data from above: § send ACK(n) if next available seq # in window, § § out-of-order: buffer send pkt § in-order: deliver (also deliver buffered, timeout(n): in-order pkts), advance window to next not-yet-received pkt resend pkt n, restart timer § pkt n in [rcvbase-N, rcvbase-1] ACK(n) in [sendbase, sendbase+N]: § ACK(n) mark pkt n as received § otherwise: if n is smallest unACKed pkt, § ignore § advance window base to next unACKed seq # 57

  31. Selective repeat in action sender receiver sender window (N=4) send pkt0 0 1 2 3 4 5 6 7 8 send pkt1 0 1 2 3 4 5 6 7 8 receive pkt0, send ack0 send pkt2 0 1 2 3 4 5 6 7 8 receive pkt1, send ack1 X loss send pkt3 0 1 2 3 4 5 6 7 8 (wait) receive pkt3, buffer, send ack3 rcv ack0, send pkt4 0 1 2 3 4 5 6 7 8 rcv ack1, send pkt5 0 1 2 3 4 5 6 7 8 receive pkt4, buffer, send ack4 record ack3 arrived receive pkt5, buffer, send ack5 pkt 2 timeout send pkt2 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 record ack4 arrived rcv pkt2; deliver pkt2, 0 1 2 3 4 5 6 7 8 record ack5 arrived pkt3, pkt4, pkt5; send ack2 0 1 2 3 4 5 6 7 8 Q: what happens when ack2 arrives? 58

  32. Selective repeat receiver window sender window (after receipt) (after receipt) pkt0 0 1 2 3 0 1 2 Dilemma example Ø pkt1 0 1 2 3 0 1 2 0 1 2 3 0 1 2 pkt2 0 1 2 3 0 1 2 0 1 2 3 0 1 2 § seq #’s: 0, 1, 2, 3 0 1 2 3 0 1 2 pkt3 0 1 2 3 0 1 2 X § window size=3 0 1 2 3 0 1 2 pkt0 will accept packet with seq number 0 (a) no problem § receiver sees no difference in two scenarios! receiver can ’ t see sender side. receiver behavior identical in both cases! § duplicate data accepted as new in (b) something ’ s (very) wrong! pkt0 0 1 2 3 0 1 2 § Q: what relationship between seq # size and pkt1 0 1 2 3 0 1 2 0 1 2 3 0 1 2 pkt2 window size to avoid problem in (b)? 0 1 2 3 0 1 2 0 1 2 3 0 1 2 X 0 1 2 3 0 1 2 X timeout X retransmit pkt0 pkt0 0 1 2 3 0 1 2 will accept packet with seq number 0 (b) oops! 59

  33. Summary of rdt Mechanism Use Checksum detect bit errors Timer timeout/retransmit a packet when packet (or its ACK) is lost within the channel Sequence# sequential numbering of packets of data flowing from sender to receiver, detects duplicates, in-order delivery ACK Packet received correctly, has sequence numbers based on which retransmissions are done NACK a packet has not been received correctly (checksum failed) Window, allows multiple packets to be transmitted but not yet acknowledged, improves pipelining sender utilization compared to stop-and-wait mode of operation 60

  34. Connection-oriented Transport: TCP 61

  35. TCP: Overview RFCs: 793,1122,1323, 2018, 2581 § full duplex data: § point-to-point: • bi-directional data flow in same • one sender, one receiver connection § reliable, in-order byte steam: • MSS: maximum segment size • no “ message boundaries ” § connection-oriented: § pipelined: • handshaking (exchange of control • TCP congestion and flow control set msgs) inits sender, receiver state window size before data exchange § flow controlled: • sender will not overwhelm receiver 62

  36. TCP segment structure 32 bits URG: urgent data counting source port # dest port # (generally not used) by bytes sequence number of data ACK: ACK # (not segments!) acknowledgement number CWR: congestion valid head window reduced … receive window C E U A P R S F len ECE: ECN Echo # bytes PSH: push data now checksum Urg data pointer rcvr willing (generally not used) to accept RST, SYN, FIN: options (variable length) (used for flow control) connection estab (setup, teardown commands) application Last byte of urgent data data Internet (variable length) checksum In practice, the PSH, (as in UDP) URG, and the urgent data pointer are not used. 63

  37. TCP seq. numbers, ACKs outgoing segment from sender sequence numbers: source port # dest port # sequence number • byte stream “ number ” of first byte in segment ’ s acknowledgement number rwnd data checksum urg pointer window size acknowledgements: N • seq # of next byte expected from other side sender sequence number space • cumulative ACK sent sent, not- usable not Q: how receiver handles out-of-order ACKed yet ACKed but not usable ( “ in-flight ” ) yet sent segments incoming segment to sender source port # dest port # • A: TCP spec doesn ’ t say, - up to implementor sequence number acknowledgement number A rwnd checksum urg pointer 64

  38. TCP seq. numbers, ACK s suppose the starting sequence numbers are 42 and 79 Host B Host A User types ‘ C ’ Seq=42, ACK=79, data = ‘ C ’ host ACKs receipt of ‘ C ’ , echoes Seq=79, ACK=43, data = ‘ C ’ back ‘ C ’ host ACKs receipt of echoed ‘ C ’ Seq=43, ACK=80 simple telnet scenario 65

  39. TCP round trip time, timeout Q: how to set TCP timeout Q: how to estimate RTT? value? § SampleRTT : measured time § longer than RTT from segment transmission • but RTT varies until ACK receipt § too short: premature timeout, • ignore retransmissions unnecessary retransmissions § SampleRTT will vary, want § too long: slow reaction to estimated RTT “ smoother ” segment loss • average several recent measurements, not just current SampleRTT 66

  40. TCP round trip time, timeout EstimatedRTT = (1- a )*EstimatedRTT + a *SampleRTT § exponential weighted moving average § influence of past sample decreases exponentially fast § typical value: a = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr RTT (milliseconds) 300 250 RTT (milliseconds) 200 sampleRTT 150 EstimatedRTT 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconds) time (seconnds) SampleRTT Estimated RTT Timeout = 2*EstimatedRTT 67

  41. How to calculate SampleRTT? Associating the ACK with (a) original transmission versus (b) retransmission 68

  42. Karn/Partridge Algorithm Ø Do not sample RTT when retransmitting Ø Karn-Partridge algorithm was an improvement over the original approach, but it does not eliminate congestion Ø We need to understand how timeout is related to congestion § If you timeout too soon, you may unnecessarily retransmit a segment which adds load to the network 69

  43. Karn/Partridge Algorithm Ø Main problem with the original computation is that it does not take variance of Sample RTTs into consideration. Ø If the variance among Sample RTTs is small § Then the Estimated RTT can be better trusted § There is no need to multiply this by 2 to compute the timeout 70

  44. Karn/Partridge Algorithm Ø On the other hand, a large variance in the samples suggest that timeout value should not be tightly coupled to the Estimated RTT Ø Jacobson/Karels proposed a new scheme for TCP retransmission 71

  45. Jacobson/Karels Algorithm § timeout interval: EstimatedRTT plus “ safety margin ” large variation in EstimatedRTT à larger safety margin § estimate SampleRTT deviation from EstimatedRTT: § RFC 6298 § Measure of variability DevRTT = (1- b )*DevRTT + b *(|SampleRTT-EstimatedRTT| ) (typically, b = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT estimated RTT “ safety margin ” 72

  46. TCP reliable data transfer let ’ s initially consider § TCP creates rdt service on top of IP ’ s unreliable simplified TCP sender: service • ignore duplicate acks • ignore flow control, congestion • pipelined segments control • cumulative acks • single retransmission timer § retransmissions triggered by: • timeout events • duplicate acks 73

  47. TCP sender events: data rcvd from app: timeout: § create segment with seq # § retransmit segment that caused timeout § seq # is byte-stream number of first data byte in segment § restart timer § start timer if not already ack rcvd: running § if ack acknowledges • think of timer as for oldest unacked previously unacked segments segment • update what is known to be ACKed • expiration interval: TimeOutInterval • start timer if there are still unacked segments 74

  48. TCP sender (simplified) data received from application above create segment, seq. #: NextSeqNum pass segment to IP (i.e., “ send ” ) NextSeqNum = NextSeqNum + length(data) if (timer currently not running) L start timer wait NextSeqNum = InitialSeqNum for SendBase = InitialSeqNum event timeout retransmit not-yet-acked segment with smallest seq. # start timer ACK received, with ACK field value y if (y > SendBase) { SendBase = y /* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments) start timer else stop timer } 75

  49. TCP: retransmission scenarios Host B Host B Host A Host A SendBase=92 Seq=92, 8 bytes of data Seq=92, 8 bytes of data Seq=100, 20 bytes of data timeout timeout ACK=100 X ACK=100 ACK=120 Seq=92, 8 bytes of data Seq=92, 8 SendBase=100 bytes of data SendBase=120 ACK=100 ACK=120 SendBase=120 lost ACK scenario premature timeout 76

  50. TCP: retransmission scenarios Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data ACK=100 timeout X ACK=120 Seq=120, 15 bytes of data cumulative ACK 77

  51. TCP ACK generation [RFC 1122, RFC 2581] event at receiver TCP receiver action delayed ACK. Wait up to 500ms arrival of in-order segment with for next segment. If no next segment, expected seq #. All data up to send ACK expected seq # already ACKed immediately send single cumulative arrival of in-order segment with ACK, ACKing both in-order segments expected seq #. One other segment has ACK pending immediately send duplicate ACK , arrival of out-of-order segment indicating seq. # of next expected byte higher-than-expect seq. # . Gap detected immediate send ACK, provided that arrival of segment that segment starts at lower end of gap partially or completely fills gap 78

  52. TCP fast retransmit § time-out period often relatively TCP fast retransmit long: if sender receives 3 ACKs for same data • long delay before resending lost packet ( “ triple duplicate ACKs ” ), § detect lost segments via resend unacked segment with smallest duplicate ACKs. seq # § likely that unacked • sender often sends many segments segment lost, so don ’ t back-to-back wait for timeout • if segment is lost, there will likely be many duplicate ACKs. 79

  53. TCP fast retransmit Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data X ACK=100 timeout ACK=100 ACK=100 ACK=100 Seq=100, 20 bytes of data fast retransmit after sender receipt of triple duplicate ACK 80

  54. TCP flow control application process application may remove data from application TCP socket buffers …. OS TCP socket receiver buffers … slower than TCP receiver is delivering (sender is sending) TCP code flow control IP receiver controls sender, so code sender won ’ t overflow receiver ’ s buffer by transmitting too much, too fast from sender receiver protocol stack 81

  55. TCP flow control Ø receiver “advertises” free buffer space by including rwnd (receiver window) to application process value in TCP header of receiver-to- sender segments buffered data RcvBuffer § RcvBuffer size set via socket options (typical rwnd free buffer space default is 4096 bytes) § many operating systems autoadjust RcvBuffer Ø sender limits amount of unacked (“in- TCP segment payloads flight”) data to receiver’s rwnd value receiver-side buffering Ø guarantees receive buffer will not overflow 82

  56. Sliding Window Protocol Ø TCP’s variant of the sliding window algorithm, which serves several purposes: § it guarantees the reliable delivery of data, § it ensures that data is delivered in order, and § it enforces flow control between the sender and the receiver. 83

  57. Sliding Window Byte increase Byte increase Relationship between TCP send buffer (a) and receive buffer (b). 84

  58. TCP Sliding Window Ø Sending Side § LastByteAcked ≤ LastByteSent § LastByteSent ≤ LastByteWritten Ø Receiving Side § LastByteRead < NextByteExpected § NextByteExpected ≤ LastByteRcvd + 1 85

  59. TCP Flow Control LastByteRcvd − LastByteRead ≤ MaxRcvBuffer Ø AdvertisedWindow = MaxRcvBuffer − ((NextByteExpected − 1) − LastByteRead) Ø LastByteSent − LastByteAcked ≤ AdvertisedWindow Ø EffectiveWindow = AdvertisedWindow − (LastByteSent − LastByteAcked) Ø LastByteWritten − LastByteAcked ≤ MaxSendBuffer Ø If the sending process tries to write y bytes to TCP, but Ø (LastByteWritten − LastByteAcked) + y > MaxSendBuffer then TCP blocks the sending process and does not allow it to generate more data. 86

  60. Protecting against Wraparound Ø SequenceNum: 32 bits longs Ø AdvertisedWindow: 16 bits long § TCP has satisfied the requirement of the sliding § window algorithm that is the sequence number § space be twice as big as the window size § 2 32 >> 2 × 2 16 87

  61. Protecting against Wraparound Ø Relevance of the 32-bit sequence number space § The sequence number used on a given connection might wraparound § A byte with sequence number x could be sent at one time, and then at a later time a second byte with the same sequence number x could be sent § Packets cannot survive in the Internet for longer than the MSL (maximum segment lifetime) § MSL is set to 120 sec [recommended RFC 793] § Make sure that the sequence number does not wrap around within a 120-second period of time § Depends on how fast data can be transmitted over the Internet 88

  62. Protecting against Wraparound Time until 32-bit sequence number space wraps around. 89

  63. Keeping the Pipe Full Ø 16-bit AdvertisedWindow field must be big enough to allow the sender to keep the pipe full Ø 16-bit field translates to max 64KB advertised window Ø Clearly the receiver is free not to open the window as large as the AdvertisedWindow field allows Ø If the receiver has enough buffer space § The window needs to be opened far enough to allow a full delay × bandwidth product’s worth of data § Assuming an RTT of 100 ms 90

  64. Keeping the Pipe Full Required window size for 100-ms RTT. 91

  65. Connection Management before exchanging data, sender/receiver “ handshake ” : agree to establish connection (each knowing the other willing to establish § connection) agree on connection parameters § application application connection state: ESTAB connection state: ESTAB connection variables: connection Variables: seq # client-to-server seq # client-to-server server-to-client server-to-client rcvBuffer size rcvBuffer size at server,client at server,client network network Socket clientSocket = Socket connectionSocket = newSocket("hostname","port number"); welcomeSocket.accept(); 92

  66. TCP 3-way handshake client state server state LISTEN LISTEN choose init seq num, x send TCP SYN msg SYNSENT SYNbit=1, Seq=x choose init seq num, y send TCP SYNACK SYN RCVD msg, acking SYN SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 received SYNACK(x) indicates server is live; ESTAB send ACK for SYNACK; this segment may contain ACKbit=1, ACKnum=y+1 client-to-server data received ACK(y) indicates client is live ESTAB 93

  67. TCP: closing a connection Ø client, server each close their side of connection § send TCP segment with FIN bit = 1 Ø respond to received FIN with ACK § on receiving FIN, ACK can be combined with own FIN Ø simultaneous FIN exchanges can be handled 94

  68. TCP: closing a connection client state server state ESTAB ESTAB clientSocket.close() FINbit=1, seq=x FIN_WAIT_1 can no longer send but can CLOSE_WAIT receive data ACKbit=1; ACKnum=x+1 can still wait for server FIN_WAIT_2 send data close LAST_ACK FINbit=1, seq=y can no longer TIMED_WAIT send data ACKbit=1; ACKnum=y+1 timed wait for 2*max CLOSED segment lifetime CLOSED 95

  69. TCP State Transition Diagram Extremely simplified in this diagram 96

  70. Principles of Congestion Control 97

  71. Principles of congestion control congestion : § Informally: § “ too many sources sending too much data too fast for network to handle ” § Different from flow control! § Manifestations: § lost packets (buffer overflow at routers) § long delays (queueing in router buffers) 98

  72. Causes/costs of congestion: scenario 1 § two senders, two receivers original data: l in throughput: l out § one router, infinite buffers Host A § output link capacity: R unlimited shared output link buffers § no retransmission Host B R/2 delay l out l in l in R/2 R/2 v large queuing delays as arrival maximum per-connection § rate, l in , approaches capacity throughput: R/2 99

  73. Causes/costs of congestion: scenario 2 § one router, finite buffers § sender retransmission of timed-out packet • application-layer input = application-layer output: l in = l out • transport-layer input includes retransmissions : l’ in >=l in l in : original data l out l' in : original data, plus retransmitted data Host A finite shared output Host B link buffers 100

Recommend


More recommend