rdt1.0: reliable transfer over a reliable channel underlying channel perfectly reliable § • no bit errors • no loss of packets separate FSMs for sender, receiver: § • sender sends data into underlying channel • receiver reads data from underlying channel rdt_send(data) rdt_rcv(packet) Wait for call Wait for call extract (packet,data) from above from below packet = make_pkt(data) deliver_data(data) udt_send(packet) sender receiver 28
rdt2.0: channel with bit errors underlying channel may flip bits in packet § • checksum to detect bit errors the question: how to recover from errors: § • acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK • negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors • sender retransmits pkt on receipt of NAK How do humans recover from “ errors ” new mechanisms in rdt2.0 (beyond rdt1.0 ): § during conversation? • error detection • receiver feedback: control msgs (ACK,NAK) rcvr->sender 29
rdt2.0: channel with bit errors underlying channel may flip bits in packet § • checksum to detect bit errors the question: how to recover from errors: § • acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK • negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors • sender retransmits pkt on receipt of NAK new mechanisms in rdt2.0 (beyond rdt1.0 ): § • error detection • feedback: control msgs (ACK,NAK) from receiver to sender 30
rdt2.0: FSM specification rdt_send(data) receiver sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) rdt_rcv(rcvpkt) && Wait for call Wait for ACK udt_send(sndpkt) corrupt(rcvpkt) from above or NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) L Wait for call from below sender rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 31
rdt2.0: operation with no errors rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) rdt_rcv(rcvpkt) && Wait for call Wait for ACK udt_send(sndpkt) corrupt(rcvpkt) from above or NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for call L from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 32
rdt2.0: error scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) rdt_rcv(rcvpkt) && Wait for call Wait for ACK udt_send(sndpkt) corrupt(rcvpkt) from above or NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for call L from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 33
rdt2.0 has a fatal flaw! handling duplicates: what happens if ACK/NAK corrupted? sender retransmits current pkt if § ACK/NAK corrupted sender doesn ’ t know what § happened at receiver! sender adds sequence number to § each pkt can ’ t just retransmit: possible § duplicate receiver discards (doesn ’ t deliver up) § duplicate pkt stop and wait sender sends one packet, then waits for receiver response 34
rdt2.1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) ) Wait for ACK Wait for call or NAK 0 0 from udt_send(sndpkt) above rdt_rcv(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) && isACK(rcvpkt) L L Wait for Wait for ACK call 1 from or NAK 1 rdt_rcv(rcvpkt) && above ( corrupt(rcvpkt) || rdt_send(data) isNAK(rcvpkt) ) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) udt_send(sndpkt) 35
rdt2.1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) udt_send(sndpkt) Wait for Wait for 0 from rdt_rcv(rcvpkt) && 1 from rdt_rcv(rcvpkt) && below below not corrupt(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt) has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) 36
rdt2.1: discussion sender: receiver: seq # added to pkt must check if received § § packet is duplicate • state indicates whether 0 or 1 is must check if received § expected pkt seq # ACK/NAK corrupted note: receiver can not § twice as many states § know if its last ACK/NAK • state must “ remember ” whether received OK at sender “ expected ” pkt should have seq # of 0 or 1 37
rdt2.2: a NAK-free protocol same functionality as rdt2.1, using ACKs only § instead of NAK, receiver sends ACK for last pkt received § OK • receiver must explicitly include seq # of pkt being ACKed duplicate ACK at sender results in same action as NAK: § retransmit current pkt 38
rdt2.2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for isACK(rcvpkt,1) ) Wait for call 0 ACK 0 from above udt_send(sndpkt) sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || L Wait for receiver FSM has_seq1(rcvpkt)) 0 from fragment udt_send(sndpkt) below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt) 39
rdt3.0: channels with errors and loss approach: sender waits “reasonable” new assumption: underlying channel amount of time for ACK can also lose packets (data, ACKs) retransmits if no ACK received checksum, seq. #, ACKs, Ø Ø in this time retransmissions will be of help … but not enough if pkt (or ACK) just delayed (not Ø lost): retransmission will be Ø duplicate, but seq. # ’ s already handles this receiver must specify seq # of Ø pkt being ACKed requires countdown timer Ø 40
rdt3.0 sender rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,1) ) start_timer L rdt_rcv(rcvpkt) L Wait for Wait for timeout ACK0 call 0from udt_send(sndpkt) above start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt,1) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer Wait for Wait for timeout ACK1 call 1 from udt_send(sndpkt) above rdt_rcv(rcvpkt) start_timer L rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(1, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,0) ) start_timer L 41
rdt3.0 in action receiver receiver sender sender send pkt0 send pkt0 pkt0 pkt0 rcv pkt0 rcv pkt0 send ack0 send ack0 ack0 ack0 rcv ack0 rcv ack0 pkt1 pkt1 send pkt1 send pkt1 X rcv pkt1 loss ack1 send ack1 rcv ack1 pkt0 send pkt0 timeout rcv pkt0 resend pkt1 send ack0 pkt1 ack0 rcv pkt1 ack1 send ack1 rcv ack1 pkt0 send pkt0 rcv pkt0 (a) no loss send ack0 ack0 (b) packet loss 42
rdt3.0 in action receiver sender sender receiver send pkt0 send pkt0 pkt0 pkt0 rcv pkt0 rcv pkt0 send ack0 send ack0 ack0 ack0 rcv ack0 rcv ack0 pkt1 pkt1 send pkt1 send pkt1 rcv pkt1 rcv pkt1 ack1 send ack1 send ack1 X ack1 loss timeout timeout resend pkt1 pkt1 resend pkt1 pkt1 rcv pkt1 rcv pkt1 rcv ack1 (detect duplicate) (detect duplicate) pkt0 ack1 send ack1 send ack1 send pkt0 rcv ack1 ack1 rcv pkt0 rcv ack1 pkt0 send pkt0 ack0 send ack0 send pkt0 rcv pkt0 pkt0 rcv pkt0 send ack0 ack0 ack0 (detect duplicate) send ack0 (c) ACK loss (d) premature timeout/ delayed ACK 43
Performance of rdt3.0 rdt3.0 is correct, but performance stinks § e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet: § L 8000 bits = 8 microsecs = D trans = R 10 9 bits/sec § U sender : utilization – fraction of time sender busy sending L / R . 008 U sender = = 0.00027 = 30.008 RTT + L / R § if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput over 1 Gbps link § network protocol limits use of physical resources! 44
rdt3.0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R L / R . 008 U sender = = 0.00027 = 30.008 RTT + L / R 45
Pipelined protocols pipelining: sender allows multiple, “in-flight”, yet-to-be- acknowledged pkts • range of sequence numbers must be increased • buffering at sender and/or receiver § two generic forms of pipelined protocols: go-Back-N, selective repeat 46
Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R 3-packet pipelining increases utilization by a factor of 3! 3L / R . 0024 U sender = = 0.00081 = 30.008 RTT + L / R 47
Pipelined protocols: overview Go-back-N: Selective Repeat: sender can have up to N sender can have up to N § § unacked packets in unack’ed packets in pipeline pipeline receiver only sends rcvr sends individual ack § § cumulative ack for each packet • doesn’t ack packet if there’s a gap sender maintains timer for § sender has timer for § each unacked packet oldest unacked packet • when timer expires, retransmit • when timer expires, retransmit only that unacked packet all unacked packets 48
Go-Back-N: sender k-bit seq # in pkt header § “window” of up to N, consecutive unack’ed pkts allowed § § ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” • may receive duplicate ACKs (see receiver) § timer for oldest in-flight pkt § timeout(n): retransmit packet n and all higher seq # pkts in window 49
GBN: sender extended FSM rdt_send(data) if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else L refuse_data(data) base=1 nextseqnum=1 timeout start_timer Wait udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) rdt_rcv(rcvpkt) … && corrupt(rcvpkt) udt_send(sndpkt[nextseqnum-1]) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stop_timer else start_timer 50
GBN: receiver extended FSM default udt_send(sndpkt) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum) L Wait extract(rcvpkt,data) expectedseqnum=1 deliver_data(data) sndpkt = sndpkt = make_pkt(expectedseqnum,ACK,chksum) make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in-order seq # • may generate duplicate ACKs • need only remember expectedseqnum out-of-order pkt: § • discard (don’t buffer): no receiver buffering! • re-ACK pkt with highest in-order seq # 51
GBN in action sender receiver sender window (N=4) send pkt0 0 1 2 3 4 5 6 7 8 send pkt1 0 1 2 3 4 5 6 7 8 receive pkt0, send ack0 send pkt2 0 1 2 3 4 5 6 7 8 receive pkt1, send ack1 X loss send pkt3 0 1 2 3 4 5 6 7 8 (wait) receive pkt3, discard, (re)send ack1 rcv ack0, send pkt4 0 1 2 3 4 5 6 7 8 rcv ack1, send pkt5 0 1 2 3 4 5 6 7 8 receive pkt4, discard, (re)send ack1 ignore duplicate ACK (ack1) receive pkt5, discard, (re)send ack1 pkt 2 timeout send pkt2 0 1 2 3 4 5 6 7 8 send pkt3 0 1 2 3 4 5 6 7 8 rcv pkt2, deliver, send ack2 send pkt4 0 1 2 3 4 5 6 7 8 rcv pkt3, deliver, send ack3 send pkt5 0 1 2 3 4 5 6 7 8 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5 52
Selective repeat receiver individually acknowledges all correctly received § pkts • buffers pkts, as needed, for eventual in-order delivery to upper layer sender only resends pkts for which ACK not received § • sender timer for each unACKed pkt sender window § • N consecutive seq #’s • limits seq #s of sent, unACKed pkts 53
Selective repeat: sender, receiver windows 54
Selective repeat receiver sender pkt n in [rcvbase, rcvbase+N-1] data from above: § send ACK(n) if next available seq # in § § out-of-order: buffer window, send pkt § in-order: deliver (also deliver timeout(n): buffered, in-order pkts), advance window to next not- resend pkt n, restart timer § yet-received pkt ACK(n) in [sendbase,sendbase+N]: pkt n in [rcvbase-N,rcvbase-1] mark pkt n as received § § ACK(n) otherwise: if n smallest unACKed pkt, § advance window base to next § ignore unACKed seq # 55
Selective repeat in action sender receiver sender window (N=4) send pkt0 0 1 2 3 4 5 6 7 8 send pkt1 0 1 2 3 4 5 6 7 8 receive pkt0, send ack0 send pkt2 0 1 2 3 4 5 6 7 8 receive pkt1, send ack1 X loss send pkt3 0 1 2 3 4 5 6 7 8 (wait) receive pkt3, buffer, send ack3 rcv ack0, send pkt4 0 1 2 3 4 5 6 7 8 rcv ack1, send pkt5 0 1 2 3 4 5 6 7 8 receive pkt4, buffer, send ack4 record ack3 arrived receive pkt5, buffer, send ack5 pkt 2 timeout send pkt2 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 record ack4 arrived rcv pkt2; deliver pkt2, 0 1 2 3 4 5 6 7 8 record ack5 arrived pkt3, pkt4, pkt5; send ack2 0 1 2 3 4 5 6 7 8 Q: what happens when ack2 arrives? 56
Selective repeat sender window receiver window (after receipt) (after receipt) pkt0 0 1 2 3 0 1 2 Ø Dilemma example pkt1 0 1 2 3 0 1 2 0 1 2 3 0 1 2 pkt2 0 1 2 3 0 1 2 0 1 2 3 0 1 2 § seq #’s: 0, 1, 2, 3 0 1 2 3 0 1 2 pkt3 0 1 2 3 0 1 2 § window size=3 X 0 1 2 3 0 1 2 pkt0 will accept packet with seq number 0 (a) no problem § receiver sees no difference in receiver can’t see sender side. two scenarios! receiver behavior identical in both cases! something’s (very) wrong! § duplicate data accepted as new in (b) pkt0 0 1 2 3 0 1 2 pkt1 0 1 2 3 0 1 2 0 1 2 3 0 1 2 pkt2 0 1 2 3 0 1 2 0 1 2 3 0 1 2 § Q: what relationship between X 0 1 2 3 0 1 2 X seq # size and window size to timeout X retransmit pkt0 avoid problem in (b)? pkt0 0 1 2 3 0 1 2 will accept packet with seq number 0 (b) oops! 57
Connection-oriented Transport: TCP 58
TCP: Overview RFCs: 793,1122,1323, 2018, 2581 full duplex data: point-to-point: § § • bi-directional data flow in same • one sender, one receiver connection reliable, in-order byte § • MSS: maximum segment size steam: connection-oriented: § • no “message boundaries” • handshaking (exchange of pipelined: § control msgs) inits sender, • TCP congestion and flow receiver state before data control set window size exchange flow controlled: § • sender will not overwhelm receiver 59
TCP segment structure 32 bits URG: urgent data counting source port # dest port # (generally not used) by bytes sequence number of data ACK: ACK # (not segments!) acknowledgement number valid head not receive window U A P R S F PSH: push data now len used # bytes (generally not used) checksum Urg data pointer rcvr willing to accept RST, SYN, FIN: options (variable length) connection estab (setup, teardown commands) application data Internet (variable length) checksum (as in UDP) 60
TCP seq. numbers, ACKs outgoing segment from sender sequence numbers: source port # dest port # • byte stream “number” of first sequence number acknowledgement number byte in segment’s data rwnd checksum urg pointer acknowledgements: window size N • seq # of next byte expected from other side • cumulative ACK sender sequence number space Q: how receiver handles out-of- sent sent, not-yet usable not ACKed ACKed but not usable order segments (“in-flight”) yet sent • A: TCP spec doesn’t say, - up to incoming segment to sender implementor source port # dest port # sequence number acknowledgement number A rwnd checksum urg pointer 61
TCP seq. numbers, ACK s Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes Seq=79, ACK=43, data = ‘C’ back ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario 62
TCP round trip time, timeout Q: how to set TCP timeout Q: how to estimate RTT? value? SampleRTT : measured time § from segment transmission until longer than RTT § ACK receipt • but RTT varies • ignore retransmissions too short: premature § SampleRTT will vary, want § timeout, unnecessary estimated RTT “smoother” retransmissions • average several recent too long: slow reaction to § measurements, not just current segment loss SampleRTT 63
TCP round trip time, timeout EstimatedRTT = (1- a )*EstimatedRTT + a *SampleRTT § exponential weighted moving average § influence of past sample decreases exponentially fast § typical value: a = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 300 RTT (milliseconds) 250 RTT (milliseconds) 200 sampleRTT 150 EstimatedRTT 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconds) time (seconnds) Timeout = 2*EstimatedRTT SampleRTT Estimated RTT 64
How to calculate SampleRTT? Associating the ACK with (a) original transmission versus (b) retransmission 65
Karn/Partridge Algorithm Ø Do not sample RTT when retransmitting Ø Karn-Partridge algorithm was an improvement over the original approach, but it does not eliminate congestion Ø We need to understand how timeout is related to congestion § If you timeout too soon, you may unnecessarily retransmit a segment which adds load to the network 66
Karn/Partridge Algorithm Ø Main problem with the original computation is that it does not take variance of Sample RTTs into consideration. Ø If the variance among Sample RTTs is small § Then the Estimated RTT can be better trusted § There is no need to multiply this by 2 to compute the timeout 67
Karn/Partridge Algorithm Ø On the other hand, a large variance in the samples suggest that timeout value should not be tightly coupled to the Estimated RTT Ø Jacobson/Karels proposed a new scheme for TCP retransmission 68
Jacobson/Karels Algorithm timeout interval: EstimatedRTT plus “safety margin” § • large variation in EstimatedRTT -> larger safety margin estimate SampleRTT deviation from EstimatedRTT: § RFC 6298 § Measure of variability DevRTT = (1- b )*DevRTT + b *|SampleRTT-EstimatedRTT| (typically, b = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT estimated RTT “safety margin” 69
TCP reliable data transfer TCP creates rdt service on let’s initially consider § top of IP’s unreliable simplified TCP sender: service • ignore duplicate acks • ignore flow control, congestion • pipelined segments control • cumulative acks • single retransmission timer retransmissions triggered § by: • timeout events • duplicate acks 70
TCP sender events: data rcvd from app: timeout: create segment with seq # retransmit segment that § § caused timeout seq # is byte-stream § number of first data byte restart timer § in segment ack rcvd: start timer if not already § if ack acknowledges § running previously unacked • think of timer as for oldest segments unacked segment • update what is known to be • expiration interval: ACKed TimeOutInterval • start timer if there are still unacked segments 71
TCP sender (simplified) data received from application above create segment, seq. #: NextSeqNum pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running) L start timer wait NextSeqNum = InitialSeqNum for SendBase = InitialSeqNum event timeout retransmit not-yet-acked segment with smallest seq. # start timer ACK received, with ACK field value y if (y > SendBase) { SendBase = y /* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments) start timer else stop timer } 72
TCP: retransmission scenarios Host B Host B Host A Host A SendBase=92 Seq=92, 8 bytes of data Seq=92, 8 bytes of data Seq=100, 20 bytes of data timeout timeout ACK=100 X ACK=100 ACK=120 Seq=92, 8 bytes of data Seq=92, 8 SendBase=100 bytes of data SendBase=120 ACK=100 ACK=120 SendBase=120 lost ACK scenario premature timeout 73
TCP: retransmission scenarios Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data ACK=100 timeout X ACK=120 Seq=120, 15 bytes of data cumulative ACK 74
TCP ACK generation [RFC 1122, RFC 2581] event at receiver TCP receiver action delayed ACK. Wait up to 500ms arrival of in-order segment with for next segment. If no next segment, expected seq #. All data up to send ACK expected seq # already ACKed immediately send single cumulative arrival of in-order segment with ACK, ACKing both in-order segments expected seq #. One other segment has ACK pending immediately send duplicate ACK , arrival of out-of-order segment indicating seq. # of next expected byte higher-than-expect seq. # . Gap detected immediate send ACK, provided that arrival of segment that segment starts at lower end of gap partially or completely fills gap 75
TCP fast retransmit time-out period often § relatively long: TCP fast retransmit if sender receives 3 ACKs • long delay before resending lost for same data packet detect lost segments via (“triple duplicate ACKs”), § resend unacked segment duplicate ACKs. with smallest seq # • sender often sends many § likely that unacked segments back-to-back segment lost, so don’t wait • if segment is lost, there will for timeout likely be many duplicate ACKs. 76
TCP fast retransmit Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data X ACK=100 timeout ACK=100 ACK=100 ACK=100 Seq=100, 20 bytes of data fast retransmit after sender receipt of triple duplicate ACK 77
TCP flow control application process application may remove data from application TCP socket buffers …. OS TCP socket receiver buffers … slower than TCP receiver is delivering (sender is sending) TCP code flow control IP receiver controls sender, so sender code won’t overflow receiver’s buffer by transmitting too much, too fast from sender receiver protocol stack 78
TCP flow control receiver “advertises” free buffer space § by including rwnd (receiver window) value in TCP header of receiver-to- to application process sender segments • RcvBuffer size set via socket options (typical buffered data RcvBuffer default is 4096 bytes) • many operating systems autoadjust rwnd free buffer space RcvBuffer sender limits amount of unacked (“in- § TCP segment payloads flight”) data to receiver’s rwnd value guarantees receive buffer will not § receiver-side buffering overflow 79
Sliding Window Protocol Ø TCP’s variant of the sliding window algorithm, which serves several purposes: § it guarantees the reliable delivery of data, § it ensures that data is delivered in order, and § it enforces flow control between the sender and the receiver. 80
Sliding Window Byte increase Byte increase Relationship between TCP send buffer (a) and receive buffer (b). 81
TCP Sliding Window Ø Sending Side § LastByteAcked ≤ LastByteSent § LastByteSent ≤ LastByteWritten Ø Receiving Side § LastByteRead < NextByteExpected § NextByteExpected ≤ LastByteRcvd + 1 82
TCP Flow Control LastByteRcvd − LastByteRead ≤ MaxRcvBuffer Ø AdvertisedWindow = MaxRcvBuffer − ((NextByteExpected − 1) − Ø LastByteRead) LastByteSent − LastByteAcked ≤ AdvertisedWindow Ø EffectiveWindow = AdvertisedWindow − (LastByteSent − LastByteAcked) Ø LastByteWritten − LastByteAcked ≤ MaxSendBuffer Ø If the sending process tries to write y bytes to TCP, but Ø (LastByteWritten − LastByteAcked) + y > MaxSendBuffer then TCP blocks the sending process and does not allow it to generate more data. 83
Protecting against Wraparound Ø SequenceNum: 32 bits longs Ø AdvertisedWindow: 16 bits long § TCP has satisfied the requirement of the sliding § window algorithm that is the sequence number § space be twice as big as the window size § 2 32 >> 2 × 2 16 84
Protecting against Wraparound Relevance of the 32-bit sequence number space Ø § The sequence number used on a given connection might wraparound § A byte with sequence number x could be sent at one time, and then at a later time a second byte with the same sequence number x could be sent § Packets cannot survive in the Internet for longer than the MSL (maximum segment lifetime) § MSL is set to 120 sec [recommended RFC 793] § Make sure that the sequence number does not wrap around within a 120-second period of time § Depends on how fast data can be transmitted over the Internet 85
Protecting against Wraparound Time until 32-bit sequence number space wraps around. 86
Keeping the Pipe Full Ø 16-bit AdvertisedWindow field must be big enough to allow the sender to keep the pipe full Ø 16-bit field translates to max 64KB advertised window Ø Clearly the receiver is free not to open the window as large as the AdvertisedWindow field allows Ø If the receiver has enough buffer space § The window needs to be opened far enough to allow a full delay × bandwidth product’s worth of data § Assuming an RTT of 100 ms 87
Keeping the Pipe Full Required window size for 100-ms RTT. 88
Connection Management before exchanging data, sender/receiver “handshake”: agree to establish connection (each knowing the other willing to § establish connection) agree on connection parameters § application application connection state: ESTAB connection state: ESTAB connection variables: connection Variables: seq # client-to-server seq # client-to-server server-to-client server-to-client rcvBuffer size rcvBuffer size at server,client at server,client network network Socket clientSocket = Socket connectionSocket = newSocket("hostname","port number"); welcomeSocket.accept(); 89
TCP 3-way handshake client state server state LISTEN LISTEN choose init seq num, x send TCP SYN msg SYNSENT SYNbit=1, Seq=x choose init seq num, y send TCP SYNACK SYN RCVD msg, acking SYN SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 received SYNACK(x) indicates server is live; ESTAB send ACK for SYNACK; this segment may contain ACKbit=1, ACKnum=y+1 client-to-server data received ACK(y) indicates client is live ESTAB 90
TCP: closing a connection Ø client, server each close their side of connection § send TCP segment with FIN bit = 1 Ø respond to received FIN with ACK § on receiving FIN, ACK can be combined with own FIN Ø simultaneous FIN exchanges can be handled 91
TCP: closing a connection client state server state ESTAB ESTAB clientSocket.close() FINbit=1, seq=x FIN_WAIT_1 can no longer send but can CLOSE_WAIT receive data ACKbit=1; ACKnum=x+1 can still wait for server FIN_WAIT_2 send data close LAST_ACK FINbit=1, seq=y can no longer TIMED_WAIT send data ACKbit=1; ACKnum=y+1 timed wait for 2*max CLOSED segment lifetime CLOSED 92
TCP State Transition Diagram Extremely simplified in this diagram 93
Principles of Congestion Control 94
Principles of congestion control congestion : Informally: § § “too many sources sending too much data too fast for network to handle” Different from flow control! § Manifestations: § § lost packets (buffer overflow at routers) § long delays (queueing in router buffers) a top-10 problem! § 95
Causes/costs of congestion: scenario 1 § two senders, two receivers original data: l in throughput: l out § one router, infinite buffers Host A § output link capacity: R unlimited shared § no retransmission output link buffers Host B R/2 delay l out l in l in R/2 R/2 v large delays as arrival rate, l in , maximum per-connection § approaches capacity throughput: R/2 96
Causes/costs of congestion: scenario 2 one router, finite buffers § sender retransmission of timed-out packet § • application-layer input = application-layer output: l in = l out • transport-layer input includes retransmissions : l’ in >=l in l in : original data l out l' in : original data, plus retransmitted data Host A finite shared output link Host B buffers 97
Causes/costs of congestion: scenario 2 idealization: perfect knowledge R/2 sender sends only when router § buffers available l out l in R/2 l in : original data l out copy l' in : original data, plus retransmitted data Host A free buffer space! finite shared output link Host B buffers 98
Causes/costs of congestion: scenario 2 Idealization: known loss packets can be lost, dropped at router due to full buffers sender only resends if packet known to be lost § l in : original data l out copy l' in : original data, plus retransmitted data Host A no buffer space! Host B 99
Causes/costs of congestion: scenario 2 Idealization: known loss packets R/2 can be lost, dropped at router due to full buffers l out sender only resends if packet when sending at R/2, some § packets are retransmissions known to be lost but asymptotic goodput is R/2 l in still R/2 (why?) l in : original data l out l' in : original data, plus retransmitted data Host A free buffer space! Host B 100
Recommend
More recommend