rdt2.0: error scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for rdt_rcv(rcvpkt) && call from ACK or udt_send(sndpkt) corrupt(rcvpkt) above NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for Λ call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) Transport Layer 3-21
rdt2.0 has a fatal flaw! What happens if Handling duplicates: ACK/NAK corrupted? � sender adds sequence number to each pkt � sender doesn’t know what happened at receiver! � sender retransmits current pkt if ACK/NAK garbled � can’t just retransmit: possible duplicate � receiver discards (doesn’t deliver up) duplicate pkt What to do? � sender ACKs/NAKs stop and wait receiver’s ACK/NAK? What Sender sends one packet, if sender ACK/NAK lost? then waits for receiver � retransmit, but this might response cause retransmission of correctly received pkt! Transport Layer 3-22
rdt2.1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isNAK(rcvpkt) ) ACK or call 0 from udt_send(sndpkt) NAK 0 above rdt_rcv(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) && isACK(rcvpkt) Λ Λ Wait for Wait for ACK or call 1 from rdt_rcv(rcvpkt) && NAK 1 above ( corrupt(rcvpkt) || rdt_send(data) isNAK(rcvpkt) ) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) udt_send(sndpkt) Transport Layer 3-23
rdt2.1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) udt_send(sndpkt) Wait for Wait for 0 from rdt_rcv(rcvpkt) && 1 from rdt_rcv(rcvpkt) && below not corrupt(rcvpkt) && below not corrupt(rcvpkt) && has_seq1(rcvpkt) has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) Transport Layer 3-24
rdt2.1: discussion Sender: Receiver: � seq # added to pkt � must check if received packet is duplicate � two seq. #’s (0,1) will � state indicates whether suffice. Why? 0 or 1 is expected pkt � must check if received seq # ACK/NAK corrupted � note: receiver can not � twice as many states know if its last � state must “remember” ACK/NAK received OK whether “current” pkt at sender has 0 or 1 seq. # Transport Layer 3-25
rdt2.2: a NAK-free protocol � same functionality as rdt2.1, using ACKs only � instead of NAK, receiver sends ACK for last pkt received OK � receiver must explicitly include seq # of pkt being ACKed � duplicate ACK at sender results in same action as NAK: retransmit current pkt Transport Layer 3-26
rdt2.2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isACK(rcvpkt,1) ) ACK call 0 from udt_send(sndpkt) 0 above sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) rdt_rcv(rcvpkt) && Λ (corrupt(rcvpkt) || receiver FSM Wait for has_seq1(rcvpkt)) 0 from fragment udt_send(sndpkt) below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) Transport Layer udt_send(sndpkt) 3-27
rdt3.0: channels with errors and loss New assumption: Approach: sender waits underlying channel can “reasonable” amount of also lose packets (data time for ACK or ACKs) � retransmits if no ACK received in this time � checksum, seq. #, ACKs, retransmissions will be � if pkt (or ACK) just delayed of help, but not enough (not lost): Q: how to deal with loss? � retransmission will be duplicate, but use of seq. � sender waits until #’s already handles this certain data or ACK � receiver must specify seq lost, then retransmits # of pkt being ACKed � drawbacks? � requires countdown timer Transport Layer 3-28
rdt3.0 sender rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,1) ) start_timer Λ rdt_rcv(rcvpkt) Λ Wait Wait for timeout for call 0from udt_send(sndpkt) ACK0 above start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt,1) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer Wait Wait for timeout for call 1 from udt_send(sndpkt) ACK1 above rdt_rcv(rcvpkt) start_timer Λ rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(1, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,0) ) start_timer Λ Transport Layer 3-29
rdt3.0 in action Transport Layer 3-30
rdt3.0 in action Transport Layer 3-31
Performance of rdt3.0 � rdt3.0 works, but performance stinks � example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet: L (packet length in bits) 8kb/pkt T transmit = R (transmission rate, bps) = 10**9 b/sec = 8 microsec L / R . 008 U 0.00027 sender = = = 30.008 RTT + L / R � U sender : utilization – fraction of time sender busy sending � 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link � network protocol limits use of physical resources! Transport Layer 3-32
rdt3.0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R L / R . 008 U 0.00027 sender = = = 30.008 RTT + L / R Transport Layer 3-33
Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-to- be-acknowledged pkts � range of sequence numbers must be increased � buffering at sender and/or receiver � Two generic forms of pipelined protocols: go-Back-N, selective repeat Transport Layer 3-34
Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3! 3 * L / R . 024 U 0.0008 sender = = = 30.008 RTT + L / R Transport Layer 3-35
Utilization=N(L/R)/(RTT+L/R) if NL/R<RTT+L/R Utilization=1 if and the sender pauses after it transmits a window NL/R > RTT+L/R and the of packets until it receives first ACK sender does not pause Transport Layer 3-36
Go-Back-N Sender: � k-bit seq # in pkt header � “window” of up to N, consecutive unack’ed pkts allowed � ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” � may receive duplicate ACKs (see receiver) � timer for the entire window � timeout(n): retransmit pkt n and all higher seq # pkts in window Transport Layer 3-37
GBN: sender extended FSM rdt_send(data) if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } Λ else refuse_data(data) base=1 nextseqnum=1 timeout start_timer Wait udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) rdt_rcv(rcvpkt) … && corrupt(rcvpkt) udt_send(sndpkt[nextseqnum-1]) Λ rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stop_timer else start_timer Transport Layer 3-38
GBN: receiver extended FSM default udt_send(sndpkt) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) Λ && hasseqnum(rcvpkt,expectedseqnum) Wait extract(rcvpkt,data) expectedseqnum=1 deliver_data(data) sndpkt = sndpkt = make_pkt(expectedseqnum,ACK,chksum) make_pkt(0,ACK,chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in-order seq # � may generate duplicate ACKs � need only remember expectedseqnum � out-of-order pkt: � discard (don’t buffer) -> no receiver buffering! � Re-ACK pkt with highest in-order seq # Transport Layer 3-39
GBN in action Transport Layer 3-40
Selective Repeat � receiver individually acknowledges all correctly received pkts � buffers pkts, as needed, for eventual in-order delivery to upper layer � sender only resends pkts for which ACK not received � sender timer for each unACKed pkt � sender window � N consecutive seq #’s � again limits seq #s of sent, unACKed pkts Transport Layer 3-41
Selective repeat: sender, receiver windows Transport Layer 3-42
Selective repeat receiver sender pkt n in [rcvbase, rcvbase+N-1] data from above : � send ACK(n) � if next available seq # in window, send pkt � out-of-order: buffer timeout(n): � in-order: deliver (also deliver buffered, in-order � resend pkt n, restart timer pkts), advance window to ACK(n) in [sendbase,sendbase+N-1]: next not-yet-received pkt � mark pkt n as received pkt n in [rcvbase-N,rcvbase-1] � if n smallest unACKed pkt, � ACK(n) advance window base to next otherwise: unACKed seq # � ignore Transport Layer 3-43
Selective repeat in action Transport Layer 3-44
Selective repeat: dilemma Example: � seq #’s: 0, 1, 2, 3 � window size=3 � receiver sees no difference in two scenarios! � incorrectly passes duplicate data as new in (a) Q: what relationship between seq # size and window size? Transport Layer 3-45
Sequence Number vs. Window Size Suppose we use k bits to represent SN Question: What’s the minimum number of bits k necessary for a window size of N? Go-Back-N Q: For a given expectedSN, what’s the largest possible value for snd_base? A: If all the last N ACKs sent by the receiver are received, snd_base = expectedSN snd_base=expectedSN expectedSN+N-1 sender sender’s window receiver expectedSN Transport Layer 3-46
Sequence Number vs. Window Size Suppose we use k bits to represent SN Question: What’s the minimum number of bits k necessary for a window size of N? Go-Back-N Q: For a given expectedSN, what’s the smallest possible value for snd_base? A: If all the last N ACKs sent by the receiver are not received, snd_base = expectedSN-N snd_base=expectedSN-N expectedSN-1 sender sender’s window receiver expectedSN Transport Layer 3-47
Sequence Number vs. Window Size Go-Back-N All SNs in the interval [expectedSN-N,expectedSN+N-1] (an interval of size 2N) can be received by the receiver. Since the receiver accepts on the packet with SN=expectedSN, there should be no other packet within this interval with SN=expectedSN. Therefore, 2 k ≥ N+1 expectedSN+N-1 snd_base=expectedSN-N sender receiver expectedSN Transport Layer 3-48
Sequence Number vs. Window Size Suppose we use k bits to represent SN Question: What’s the minimum number of bits k necessary for a window size of N? Selective Repeat Q: For a given rcv_base, what’s the largest possible value for snd_base? A: If all the last N ACKs sent by the receiver are received, snd_base = rcv_base (same as go_back-N) rcv_base+N-1 snd_base=rcv_base sender sender’s window receiver’s window receiver rcv_base+N-1 rcv_base Transport Layer 3-49
Sequence Number vs. Window Size Suppose we use k bits to represent SN Question: What’s the minimum number of bits k necessary for a window size of N? Selective Repeat Q: For a given rcv_base, what’s the smallest possible value for snd_base? A: If all the last N ACKs sent by the receiver are not received, snd_base = rcv_base-N (same as Go-Back-N) snd_base=rcv_base-N rcv_base-1 sender sender’s window receiver’s window receiver rcv_base+N-1 rcv_base rcv_base Transport Layer 3-50
Sequence Number vs. Window Size Selective Repeat All SNs in the interval [rcv_base-N,rcv_base+N-1] (an interval of size 2N) can be received by the receiver. Since the receiver should be able to distinguish between all packets in this interval and take corresponding action, there should be no two packets within this interval having the same SN. Therefore, 2 k ≥ 2N rcv_base+N-1 snd_base=rcv_base-N sender receiver’s window receiver rcv_base+N-1 rcv_base rcv_base Transport Layer 3-51
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 � point-to-point: � full duplex data: � one sender, one receiver � bi-directional data flow in same connection � reliable, in-order byte � MSS: maximum segment stream: size � no “message boundaries” � connection-oriented: � pipelined: � handshaking (exchange � TCP congestion and flow of control msgs) init’s control set window size sender, receiver state � send & receive buffers before data exchange � flow controlled: � sender will not application application writes data reads data socket socket overwhelm receiver door door TCP TCP send buffer receive buffer segment Transport Layer 3-52
TCP segment structure 32 bits URG: urgent data counting source port # dest port # (generally not used) by bytes sequence number of data ACK: ACK # (not segments!) acknowledgement number valid head not Receive window U A P R S F PSH: push data now len used # bytes (generally not used) checksum Urg data pnter rcvr willing to accept RST, SYN, FIN: Options (variable length) connection estab (setup, teardown commands) application data Internet (variable length) checksum (as in UDP) Transport Layer 3-53
TCP seq. #’s and ACKs Seq. #’s: � byte stream “number” of first byte in segment’s data ACKs: � seq # of next byte expected from other side � cumulative ACK Q: how receiver handles out-of-order segments � A: TCP spec doesn’t say, - up to implementation � Widely used implementations of TCP buffer out-of- order segments Transport Layer 3-54
TCP Round Trip Time and Timeout Q: how to estimate RTT? Q: how to set TCP timeout value? � SampleRTT : measured time from segment transmission until ACK � longer than RTT receipt � but RTT varies � ignore retransmissions � too short: premature � SampleRTT will vary, want timeout estimated RTT “smoother” � unnecessary � average several recent retransmissions measurements, not just � too long: slow reaction current SampleRTT to segment loss Transport Layer 3-55
TCP Round Trip Time and Timeout EstimatedRTT = (1- α )*EstimatedRTT + α *SampleRTT � Exponential weighted moving average � influence of past sample decreases exponentially fast � typical value: α = 0.125 Transport Layer 3-56
Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 300 250 RTT (milliseconds) 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT Transport Layer 3-57
TCP Round Trip Time and Timeout Setting the timeout � EstimatedRTT plus “safety margin” � large variation in EstimatedRTT -> larger safety margin � first estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1- β )*DevRTT + β *|SampleRTT-EstimatedRTT| (typically, β = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT Transport Layer 3-58
TCP reliable data transfer � TCP creates rdt � Retransmissions are service on top of IP’s triggered by: unreliable service � timeout events � Pipelined segments � duplicate acks � Initially consider � Cumulative acks simplified TCP sender: � TCP uses single � ignore duplicate acks retransmission timer; � ignore flow control, however it just congestion control retransmits the first segment in the window Transport Layer 3-59
TCP sender events: data rcvd from app: timeout: � retransmit segment that � Create segment with caused timeout (first seq # segment in the window) � seq # is byte-stream � restart timer number of first data Ack rcvd: byte in segment � If acknowledges previously unacked segments � start timer if not � update what is known to already running (think be acked of timer as for oldest � start timer if there are unacked segment) outstanding segments � expiration interval: TimeOutInterval Transport Layer 3-60
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum TCP loop (forever) { sender switch(event) event: data received from application above (simplified) create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer Comment: pass segment to IP NextSeqNum = NextSeqNum + length(data) • SendBase-1: last cumulatively event: timer timeout ack’ed byte retransmit not-yet-acknowledged segment with Example: smallest sequence number • SendBase-1 = 71; start timer y= 73, so the rcvr wants 73+ ; event: ACK received, with ACK field value of y y > SendBase, so if (y > SendBase) { SendBase = y that new data is if (there are currently not-yet-acknowledged segments) acked start timer } } /* end of loop forever */ Transport Layer 3-61
TCP: retransmission scenarios Host A Host B Host A Host B S e S q e = q 9 = 2 9 , 2 8 , 8 b y b t e y s t e d s a d t a a t a Seq=92 timeout S e q = 1 0 0 , 2 0 b y t e s d timeout a t a ACK=100 0 0 1 = 0 X 2 K 1 C = K A loss C A S e q S = e 9 q 2 Sendbase = , 9 8 2 b , y 8 t e b s y d t e a s t a d a = 100 t a Seq=92 timeout SendBase = 120 0 2 1 = K C ACK=100 A SendBase SendBase = 100 = 120 premature timeout time time lost ACK scenario Transport Layer 3-62
TCP retransmission scenarios (more) Host A Host B S e q = 9 2 , 8 b y t e s d a t a ACK=100 timeout S e q = 1 0 0 , 2 0 b y t e s d a t a X loss ACK=120 SendBase = 120 time Cumulative ACK scenario Transport Layer 3-63
TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Delayed ACK. Wait up to 500ms Arrival of in-order segment with for next segment. If no next segment, expected seq #. All data up to send ACK expected seq # already ACKed Immediately send single cumulative Arrival of in-order segment with ACK, ACKing both in-order segments expected seq #. One other segment has ACK pending Arrival of out-of-order segment Immediately send duplicate ACK, indicating seq. # of next expected byte higher-than-expect seq. # . Gap detected Immediate send ACK, provided that Arrival of segment that segment startsat lower end of gap partially or completely fills gap Transport Layer 3-64
Fast Retransmit � Time-out period often � If sender receives 3 relatively long: ACKs for the same data, it supposes that � long delay before segment after ACKed resending lost packet data was lost: � Detect lost segments via duplicate ACKs. � fast retransmit: resend segment before timer � Sender often sends expires many segments back-to- back � If segment is lost, there will likely be many duplicate ACKs. Transport Layer 3-65
Fast Retransmit Host A Host B � Resend a segment after 3 duplicate ACKs since a seq # x1 duplicate ACK seq # x2 seq # x3 means that an out- ACK x1 X seq # x4 of sequence seq # x5 ACK x1 segment was ACK x1 ACK x1 received triple � duplicate ACKs due duplicate to packet ACKs resend seq X2 reordering! � if window is small timeout don’t get duplicate ACKs! time Transport Layer 3-66
Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for fast retransmit already ACKed segment Transport Layer 3-67
TCP Flow Control flow control sender won’t overflow � receive side of TCP receiver’s buffer by connection has a transmitting too much, receive buffer: too fast � speed-matching service: matching the send rate to the receiving app’s drain rate � app process may be slow at reading from buffer Transport Layer 3-68
TCP Flow control: how it works � Rcvr advertises spare room by including value of RcvWin in segments � Sender limits unACKed data to RcvWin (Suppose TCP receiver � guarantees receive discards out-of-order buffer doesn’t overflow segments) � spare room in buffer = RcvWin = RcvBuffer-[LastByteRcvd - LastByteRead] Transport Layer 3-69
Sliding Window Flow Control Example Receiver Buffer Sender 0 4K sends 2K of data 2K SeqNo=0 2K AckNo=2048 RcvWin=2048 Sender sends 2K 2K SeqNo=2048 of data Sender blocked 4K AckNo=4096 RcvWin=0 3K AckNo=4096 RcvWin=1024 Transport Layer 3-70
Principles of Congestion Control Congestion: � informally: “too many sources sending too much data too fast for network to handle” � different from flow control! � manifestations: � lost packets (buffer overflow at routers) � long delays (queueing in router buffers) � a top-10 problem! Transport Layer 3-71
Causes/costs of congestion: scenario 1 λ out Host A � two senders, two λ in : original data receivers � one router, unlimited shared Host B output link buffers infinite buffers � no retransmission � large delays when congested � maximum achievable throughput Transport Layer 3-72
Causes/costs of congestion: scenario 2 � one router, finite buffers � sender retransmission of lost packet λ out Host A λ in : original data λ ' in : original data, plus retransmitted data Host B finite shared output link buffers Transport Layer 3-73
Causes/costs of congestion: scenario 2 λ in λ out � always: (goodput) = λ in λ out � “perfect” retransmission only when loss: > λ in � retransmission of delayed (not lost) packet makes larger λ out (than perfect case) for same R/2 R/2 R/2 R/3 λ out λ out λ out R/4 R/2 R/2 R/2 λ in λ in λ in a. b. c. “costs” of congestion: � more work (retrans) for given “goodput” � unneeded retransmissions: link carries multiple copies of pkt Transport Layer 3-74
Causes/costs of congestion: scenario 3 � four senders λ in Q: what happens as � multihop paths λ in and increase ? � timeout/retransmit λ out Host A λ in : original data λ ' in : original data, plus retransmitted data finite shared output link buffers Host B Transport Layer 3-75
Causes/costs of congestion: scenario 3 λ H o o s u t t A H o s t B another “cost” of congestion: � when packet dropped, any “upstream transmission capacity used for that packet was wasted! Transport Layer 3-76
Approaches towards congestion control Two broad approaches towards congestion control: Network-assisted End-end congestion congestion control: control: � routers provide feedback � no explicit feedback from to end systems network � single bit indicating � congestion inferred from congestion (SNA, end-system observed loss, DECbit, TCP/IP ECN, delay ATM) � approach taken by TCP � explicit rate sender should send at Transport Layer 3-77
TCP Congestion Control � end-end control (no network How does sender assistance) perceive congestion? � sender limits transmission: � loss event = timeout or 3 duplicate acks LastByteSent-LastByteAcked � TCP sender reduces ≤ CongWin � CongWin is dynamic, function rate ( CongWin ) after of perceived network loss event congestion two modes of operation: � Slow Start (SS) � Congestion avoidance (CA) or Additive Increase Multiplicative Decrease (AIMD) Transport Layer 3-78
TCP congestion control: bandwidth probing � “probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate � continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, X loss, so decrease rate so increase rate X X X sending rate TCP’s X “sawtooth” behavior time � Q: how fast to increase/decrease? � details to follow Transport Layer 3-79
TCP Congestion Control: details � sender limits rate by limiting number of unACKed bytes “in pipeline”: LastByteSent-LastByteAcked ≤ cwnd � cwnd: differs from rwnd (how, why?) � sender limited by min(cwnd,rwnd ) cwnd � roughly, bytes cwnd rate = bytes/sec RTT RTT � cwnd is dynamic, function of perceived network congestion ACK(s) Transport Layer 3-80
TCP Congestion Control: more details segment loss event: ACK received: increase cwnd reducing cwnd � Two modes of operation: � timeout: no response � slowstart phase: from receiver • increase exponentially � cut cwnd to 1 fast (despite name) � 3 duplicate ACKs: at at connection start, least some segments or following timeout getting through (recall � congestion avoidance: fast retransmit) • increase linearly � cut cwnd in half, less aggressively than on timeout Transport Layer 3-81
TCP Slow Start Phase � when connection begins, cwnd = 1 Host A Host B MSS � example: MSS = 500 bytes & one segment RTT = 200 msec RTT � initial rate = 20 kbps � available bandwidth may be >> two segments MSS/RTT � desirable to quickly ramp up to respectable rate four segments � increase rate exponentially until first loss event or when threshold reached � double cwnd every RTT � done by incrementing cwnd by 1 time for every ACK received Transport Layer 3-82
Slow Start Example � The congestion s e g m e n t 1 cwnd = 1 window size grows ACK for segment 1 very rapidly cwnd = 2 s � For every ACK, we e g m e n t 2 increase CongWin by s e g m e n t 3 1 irrespective of the ACK for segments 2 number of segments ACK for segments 3 cwnd = 3 ACK’ed cwnd = 4 s e g m e n t 4 � double CongWin s e g m e n t 5 every RTT s e g m e n t 6 � initial rate is slow but s e g m e n t 7 ramps up exponentially fast ACK for segments 4 � TCP slows down the ACK for segments 5 cwnd = 5 increase of CongWin ACK for segments 6 cwnd = 6 when ACK for segments 7 cwnd = 7 CongWin > ssthresh cwnd = 8 Transport Layer 3-83
TCP Congestion Avoidance Phase AIMD � when cwnd > ssthresh grow cwnd linearly � ACKs: increase cwnd � increase cwnd by 1 by 1 MSS per RTT: MSS per RTT additive increase � approach possible � loss: cut cwnd in half congestion slower (non-timeout-detected than in slowstart loss ): multiplicative � implementation: cwnd decrease = cwnd + MSS/cwnd AIMD: Additive Increase for each ACK received Multiplicative Decrease Transport Layer 3-84
Congestion Avoidance � Congestion avoidance phase is started if CongWin has reached the slow-start threshold value � If CongWin >= ssthresh then each time an ACK is received, increment CongWin as follows: • CongWin = CongWin + 1/CongWin (CongWin in segments) • In actual TCP implementation CongWin is in Bytes CongWin = CongWin + MSS * (MSS/CongWin) � So CongWin is increased by one only if all CongWin segments have been acknowledged. Transport Layer 3-85
Example Slow Start/ Congestion cwnd = 1 Avoidance cwnd = 2 cwnd = 3 cwnd = 4 Assume that cwnd = 5 ssthresh = 8 cwnd = 6 cwnd = 7 cwnd = 8 14 12 Cwnd (in segments) 10 ssthresh 8 6 4 cwnd = 9 2 0 0 2 4 6 = = = = t t t t Roundtrip times cwnd = 10 Transport Layer 3-86
Slow Start / Congestion Avoidance � A typical plot of CongWin for a TCP connection (MSS = 1500 bytes) with TCP Tahoe: CA ssthresh SS Transport Layer 3-87
Responses to Congestion � TCP assumes there is congestion if it detects a packet loss � A TCP sender can detect lost packets via loss events: • Timeout of a retransmission timer • Receipt of 3 duplicate ACKs (fast retransmit) � TCP interprets a Timeout as a binary congestion signal. When a timeout occurs, the sender performs: � ssthresh is set to half the current size of the congestion window: ssthresh = CongWin / 2 � CongWin is reset to one: CongWin = 1 � and slow-start is entered Transport Layer 3-88
Fast Recovery (differentiation btwn two loss events) � After 3 dup ACKs (fast Philosophy: Retransmit): • 3 dup ACKs indicates � ssthresh = CongWin/2 network capable of � CongWin = CongWin/2 delivering some segments � window then grows • timeout before 3 dup linearly ACKs is “more alarming” � But after timeout event: � CongWin = 1 MSS; � window then grows exponentially � to the threshold, then grows linearly Transport Layer 3-89
TCP Congestion Control Initially: Slow Start CongWin = 1; (exponential ssthresh = advertised window size; increase phase) is continued until New Ack received: if (CongWin < ssthresh) /* Slow Start*/ CongWin reaches CongWin = CongWin + 1; half of the level where the loss else /* Congestion Avoidance */ event occurred CongWin = CongWin + 1/CongWin; Timeout: last time. ssthresh = CongWin/2; CongWin is increased slowly CongWin = 1; after (linear Fast Retransmission: ssthresh = CongWin/2; increase in CongWin = CongWin/2; Congestion Avoidance phase). 3-90
Popular “flavors” of TCP cwnd window size (in segments) TCP Reno ssthresh ssthresh TCP Tahoe Transmission round Transport Layer 3-91
Summary: TCP Congestion Control � When CongWin is below Threshold, sender in slow- start phase, window grows exponentially. � When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. � When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. � When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. � The actual sender window size is determined based on the congestion and flow control algorithms SenderWin=min(RcvWin,CongWin) Transport Layer 3-92
TCP Congestion Control Summary Event State TCP Sender Action Commentary ACK receipt Slow Start CongWin = CongWin + MSS, Resulting in a doubling of CongWin every RTT for previously (SS) If (CongWin > Threshold) unacked set state to “Congestion data Avoidance” ACK receipt Congestion CongWin = CongWin+MSS * Additive increase, resulting for previously Avoidance (MSS/CongWin) in increase of CongWin by unacked (CA) 1 MSS every RTT data Loss event SS or CA Threshold = CongWin/2, Fast recovery, detected by CongWin = Threshold, implementing multiplicative triple Set state to “Congestion decrease. CongWin will not duplicate Avoidance” drop below 1 MSS. ACK Timeout SS or CA Threshold = CongWin/2, Enter slow start CongWin = 1 MSS, Set state to “Slow Start” Duplicate SS or CA Increment duplicate ACK count CongWin and Threshold not ACK for segment being acked changed Transport Layer 3-93
TCP throughput � Q: what’s average throughout of TCP as function of window size, RTT? � ignoring slow start � let W be window size when loss occurs. � when window is W, throughput is W/RTT � just after loss, window drops to W/2, throughput to W/2RTT. � average throughout: .75 W/RTT Transport Layer 3-94
TCP Futures: TCP over “long, fat pipes” � example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput � requires window size W = 83,333 in-flight segments � throughput in terms of loss rate: ⋅ 1 . 22 MSS RTT L � � L = 2 · 10 -10 Wow � new versions of TCP for high-speed Transport Layer 3-95
TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck TCP router connection 2 capacity R Transport Layer 3-96
Why is TCP fair? Two competing sessions: � Additive increase gives slope of 1, as throughout increases � multiplicative decrease decreases throughput proportionally R equal bandwidth share Connection 2 throughput loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Transport Layer 3-97
Fairness (more) Fairness and parallel TCP Fairness and UDP connections � Multimedia apps often � nothing prevents app from do not use TCP opening parallel cnctions � do not want rate between 2 hosts. throttled by congestion control � Web browsers do this � Instead use UDP: � Example: link of rate R � pump audio/video at supporting 9 cnctions; constant rate, tolerate � new app asks for 1 TCP, gets packet loss rate R/10 � Research area: TCP � new app asks for 11 TCPs, friendly gets R/2 ! Transport Layer 3-98
TCP Connection Management Three way handshake: Recall: TCP sender, receiver establish “connection” Step 1: client host sends TCP before exchanging data SYN segment to server segments � specifies initial seq # � initialize TCP variables: � no data � seq. #s � buffers, flow control Step 2: server host receives info (e.g. RcvWindow ) SYN, replies with SYNACK segment � client: connection initiator Socket clientSocket = new � server allocates buffers Socket("hostname","port � specifies server initial number"); seq. # � server: contacted by client Step 3: client receives SYNACK, Socket connectionSocket = replies with ACK segment, welcomeSocket.accept(); which may contain data Transport Layer 3-99
TCP Connection Management (cont.) Closing a connection: client server close client closes socket: F I N clientSocket.close(); Step 1: client end system ACK close sends TCP FIN control FIN segment to server Step 2: server receives timed wait A C FIN, replies with ACK. K Closes connection, sends FIN. closed Transport Layer 3-100
Recommend
More recommend