csci x760 computer networks spring 2016
play

CSCI x760 - Computer Networks Spring 2016 Instructor: Prof. Roberto - PowerPoint PPT Presentation

source: computer-networks-webdesign.com CSCI x760 - Computer Networks Spring 2016 Instructor: Prof. Roberto Perdisci perdisci@cs.uga.edu These slides are adapted from the textbook slides by J.F. Kurose and K.W. Ross Chapter 3: Transport Layer


  1. Rdt1.0: reliable transfer over a reliable channel } underlying channel perfectly reliable } no bit errors } no loss of packets } separate FSMs for sender, receiver: } sender sends data into underlying channel } receiver read data from underlying channel rdt_send(data) rdt_rcv(packet) Wait for Wait for call from call from extract (packet,data) packet = make_pkt(data) below above deliver_data(data) udt_send(packet) receiver sender 3-26 Transport Layer

  2. Rdt2.0: channel with bit errors } underlying channel may flip bits in packet } checksum to detect bit errors } the question: how to recover from errors: } acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK } negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors } sender retransmits pkt on receipt of NAK } new mechanisms in rdt2.0 (beyond rdt1.0 ): } error detection } receiver feedback: control msgs (ACK,NAK) rcvr->sender 3-27 Transport Layer

  3. rdt2.0: FSM specification rdt_send(data) receiver sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for rdt_rcv(rcvpkt) && call from ACK or udt_send(sndpkt) corrupt(rcvpkt) above NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for Λ call from sender below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 3-28 Transport Layer

  4. rdt2.0: operation with no errors rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for rdt_rcv(rcvpkt) && call from ACK or udt_send(sndpkt) corrupt(rcvpkt) above NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for Λ call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 3-29 Transport Layer

  5. rdt2.0: error scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for rdt_rcv(rcvpkt) && call from ACK or udt_send(sndpkt) corrupt(rcvpkt) above NAK udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Wait for call from Λ below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 3-30 Transport Layer

  6. rdt2.0 has a fatal flaw! What happens if ACK/NAK corrupted? Handling duplicates: } sender doesn’t know what } sender retransmits current pkt happened at receiver! if ACK/NAK garbled } can’t just retransmit: possible } sender adds sequence number to duplicate each pkt } receiver discards (doesn’t deliver up) duplicate pkt stop and wait Sender sends one packet, then waits for receiver response 3-31 Transport Layer

  7. rdt2.1: handles garbled ACK/NAKs rdt_send(data) sender sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isNAK(rcvpkt) ) ACK or call 0 from udt_send(sndpkt) NAK 0 above rdt_rcv(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) Λ Λ Wait for Wait for ACK or call 1 from rdt_rcv(rcvpkt) && NAK 1 above ( corrupt(rcvpkt) || rdt_send(data) isNAK(rcvpkt) ) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) udt_send(sndpkt) 3-32 Transport Layer

  8. rdt2.1: handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) receiver && has_seq0(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) rdt_rcv(rcvpkt) && corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) udt_send(sndpkt) Wait for Wait for 0 from 1 from rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) && below not corrupt(rcvpkt) && below not corrupt(rcvpkt) && has_seq1(rcvpkt) has_seq0(rcvpkt) sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) 3-33 Transport Layer

  9. rdt2.1: discussion Sender: Receiver: } seq # added to pkt } must check if received packet is duplicate } two seq. #’s (0,1) will suffice. } state indicates whether 0 or 1 Why? is expected pkt seq # } must check if received ACK/ } note: receiver can not know NAK corrupted if its last ACK/NAK received } twice as many states OK at sender } state must “remember” whether “current” pkt has 0 or 1 seq. # 3-34 Transport Layer

  10. rdt2.2: a NAK-free protocol } same functionality as rdt2.1, using ACKs only } instead of NAK, receiver sends ACK for last pkt received OK } receiver must explicitly include seq # of pkt being ACKed } duplicate ACK at sender results in same action as NAK: retransmit current pkt 3-35 Transport Layer

  11. rdt2.2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isACK(rcvpkt,1) ) ACK call 0 from udt_send(sndpkt) 0 above sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || Λ receiver FSM Wait for has_seq1(rcvpkt)) 0 from fragment udt_send(sndpkt) below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) 3-36 Transport Layer udt_send(sndpkt)

  12. rdt3.0: channels with errors and loss Approach: sender waits New assumption: underlying “reasonable” amount of time channel can also lose packets for ACK (data or ACKs) } checksum, seq. #, ACKs, } retransmits if no ACK received in retransmissions will be of help, this time but not enough } if pkt (or ACK) just delayed (not lost): } retransmission will be duplicate, but use of seq. #’s already handles this } receiver must specify seq # of pkt being ACKed } requires countdown timer 3-37 Transport Layer

  13. rdt3.0 sender rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,1) ) start_timer rdt_rcv(rcvpkt) Λ Wait Λ Wait for timeout for call 0from udt_send(sndpkt) ACK0 above start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) rdt_rcv(rcvpkt) && isACK(rcvpkt,1) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer Wait Wait for timeout for call 1 from udt_send(sndpkt) ACK1 above rdt_rcv(rcvpkt) start_timer Λ rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(1, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,0) ) start_timer Λ 3-38 Transport Layer

  14. rdt3.0 in action 3-39 Transport Layer

  15. rdt3.0 in action 3-40 Transport Layer

  16. rdt3.0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Link Utilization Fraction of time the sender is busy sending traffic 3-41 Transport Layer

  17. Performance of rdt3.0 } rdt3.0 works, but performance is very bad } ex: 1 Gbps link, 15 ms prop. delay, 8000 bit packet: L 8000 bits d trans = R 8 microsecon ds = = 9 10 bps ❍ U sender : utilization = fraction of time sender busy sending L / R . 008 U = 0.00027 = = sender 30.008 RTT + L / R microsec ❍ ~1kB pkt/30 msec -> 33kB/sec throughput over 1 Gbps link ❍ network protocol limits use of physical resources! 3-42 Transport Layer

  18. Pipelined protocols*** Pipelining: sender allows multiple, “in-flight”, yet-to-be- acknowledged pkts } range of sequence numbers must be increased } buffering at sender and/or receiver } Two generic forms of pipelined protocols: go-Back-N, selective repeat 3-43 Transport Layer

  19. Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3! 3 * L / R . 024 U = 0.0008 = = sender 30.008 RTT + L / R microsecon 3-44 Transport Layer

  20. Pipelining Protocols Go-back-N: overview Selective Repeat: overview } sender: up to N unACKed } sender: up to N unACKed pkts in pipeline packets in pipeline } receiver: only sends } receiver: ACKs individual pkts cumulative ACKs } sender: maintains timer for } doesn’t ACK pkt if there’s a gap each unACKed pkt } sender: has timer for oldest } if timer expires: retransmit only unACKed pkt unACKed packet } if timer expires: retransmit all unACKed packets 3-45 Transport Layer

  21. Go-Back-N Sender: } k-bit seq # in pkt header (not limited to 0/1) } “ sliding window ” of up to N, consecutive unACKed pkts allowed ❒ ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” ❍ may receive duplicate ACKs (see receiver) ❒ only one timer for all in-flight pkt (started at the oldest non- acked packet) ❒ timeout(n): retransmit pkt n and all higher seq # pkts in window 3-46 Transport Layer

  22. GBN: sender extended FSM rdt_send(data) if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else Λ refuse_data(data) base=1 timeout nextseqnum=1 start_timer udt_send(sndpkt[base]) Wait udt_send(sndpkt[base+1]) rdt_rcv(rcvpkt) … && corrupt(rcvpkt) udt_send(sndpkt[nextseqnum-1] ) base = getacknum(rcvpkt)+1 if (base == nextseqnum) stop_timer else 3-47 Transport Layer start_timer

  23. GBN: receiver extended FSM default udt_send(sndpkt) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum) Λ Wait extract(rcvpkt,data) expectedseqnum=1 deliver_data(data) sndpkt = sndpkt = make_pkt(expectedseqnum,ACK,chksum) make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in-order seq # } may generate duplicate ACKs } need only remember expectedseqnum } out-of-order pkt: } discard (don’t buffer) -> no receiver buffering! } Re-ACK pkt with highest in-order seq # 3-48 Transport Layer

  24. GBN in action 3-49 Transport Layer

  25. Selective Repeat } receiver individually acknowledges all correctly received pkts } buffers pkts, as needed, for eventual in-order delivery to upper layer } sender only resends pkts for which ACK not received } sender timer for each unACKed pkt } sender window } N consecutive seq #’s } again limits seq #s of sent, unACKed pkts 3-50 Transport Layer

  26. Selective repeat: sender, receiver windows 3-51 Transport Layer

  27. Selective repeat receiver sender pkt n in [rcvbase, rcvbase+N-1] data from above : ❒ send ACK(n) } if next available seq # in window, ❒ out-of-order: buffer send pkt ❒ in-order: deliver (also timeout(n): deliver buffered, in-order } resend pkt n, restart timer pkts), advance window to ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt pkt n in [rcvbase-N,rcvbase-1] } mark pkt n as received } if n smallest unACKed pkt, ❒ ACK(n) advance window base to next otherwise: unACKed seq # ❒ ignore 3-52 Transport Layer

  28. Selective repeat in action 3-53 Transport Layer

  29. Selective repeat: dilemma Example: } seq #’s: 0, 1, 2, 3 } window size=3 } receiver sees no difference in two scenarios! } incorrectly passes duplicate data as new in (a) Q: what relationship between seq # size and window size? window size <= ½ Seq# size 3-54 Transport Layer

  30. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-55 Transport Layer

  31. TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 } point-to-point: } full duplex data: } one sender, one receiver } bi-directional data flow in same connection } reliable, in-order byte } MSS: maximum segment size stream: } connection-oriented: } no “message boundaries” } handshaking (exchange of } pipelined: control msgs) init’s sender, } TCP congestion and flow receiver state before data control set window size exchange } send & receive buffers } flow controlled: } sender will not overwhelm application application receiver writes data reads data socket socket door door TCP TCP send buffer receive buffer segment 3-56 Transport Layer

  32. TCP segment structure 32 bits URG: urgent data counting source port # dest port # (generally not used) by bytes sequence number of data ACK: ACK # (not segments!) acknowledgement number valid head not U A P R S F Receive window PSH: push data now len used # bytes (generally not used) checksum Urg data pointer rcvr willing to accept RST, SYN, FIN: Options (variable length) connection estab (setup, teardown commands) application data Internet (variable length) checksum (as in UDP) 3-57 Transport Layer

  33. TCP seq. #’s and ACKs Seq. #’s: Host A Host B } byte stream “number” User of first byte in segment’s Seq=42, ACK=79, data = ‘C’ types data ‘C’ host ACKs ACKs: receipt of ’ C } seq # of next byte ‘C’, echoes ‘ = a t a d , 3 4 = back ‘C’ K C expected from other A , 9 7 = q e S side } cumulative ACK host ACKs receipt Q: how receiver handles out- Seq=43, ACK=80 of echoed of-order segments ‘C’ } A: TCP spec doesn’t say, - up to implementer time simple telnet scenario 3-58 Transport Layer

  34. TCP: retransmission scenarios Host A Host A Host B Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=92 timeout Seq=100, 20 bytes data timeout ACK=100 X loss Seq=92, 8 bytes data Seq=92, 8 bytes data Sendbase = 100 Seq=92 timeout SendBase = 120 0 0 1 = K C A SendBase SendBase = 100 = 120 premature timeout time time lost ACK scenario 3-59 Transport Layer

  35. TCP Round Trip Time and Timeout Q: how to estimate RTT? Q: how to set TCP timeout value? } SampleRTT : measured time from segment transmission until ACK } longer than RTT receipt } but RTT varies } ignore retransmissions } too short: premature } SampleRTT will vary, want estimated timeout RTT “smoother” } unnecessary } average several recent retransmissions measurements, not just current } too long: slow reaction to SampleRTT segment loss 3-60 Transport Layer

  36. TCP Round Trip Time and Timeout EstimatedRTT = (1- α )*EstimatedRTT + α *SampleRTT (i+1) (i) (i+1) ❒ Exponential weighted moving average ❒ typical value: α = 0.125 ❒ influence of past sample decreases exponentially fast α = 0.125 3-61 Transport Layer

  37. Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 300 250 RTT (milliseconds) 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT 3-62 Transport Layer

  38. TCP Round Trip Time and Timeout Setting the timeout } EstimtedRTT plus “safety margin” } large variation in EstimatedRTT -> larger safety margin } first estimate of how much SampleRTT deviates from EstimatedRTT: EWMA of |SampleRTT – Estimated RTT| DevRTT = (1- β )*DevRTT + β *|SampleRTT-EstimatedRTT| (typically, β = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT 3-63 Transport Layer

  39. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-64 Transport Layer

  40. TCP reliable data transfer } TCP creates rdt service on } retransmissions are top of IP’s unreliable service triggered by: } timeout events } pipelined segments } duplicate ACKs } cumulative ACKs } initially consider simplified } TCP uses single TCP sender: retransmission timer } ignore duplicate ACKs } ignore flow control, congestion control 3-65 Transport Layer

  41. TCP sender events: data rcvd from app: timeout: } create segment with seq # } retransmit segment that caused timeout } seq # is byte-stream number of first data byte } restart timer in segment ACK rcvd: } start timer if not already } if acknowledges previously running (think of timer as unACKed segments for oldest unACKed } update what is known to be segment) ACKed } expiration interval: } start timer if there are outstanding segments TimeOutInterval 3-66 Transport Layer

  42. NextSeqNum = InitialSeqNum TCP SendBase = InitialSeqNum sender loop (forever) { (simplified) switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP Comment: NextSeqNum = NextSeqNum + length(data) • SendBase-1: last cumulatively event: timer timeout ACKed byte retransmit not-yet-acknowledged segment with Example: smallest sequence number • SendBase-1 = 71; start timer y= 73, so the rcvr event: ACK received, with ACK field value of y wants 73+ ; if (y > SendBase) { y > SendBase, so SendBase = y that new data is if (there are currently not-yet-acknowledged segments) ACKed start timer } } /* end of loop forever */ 3-67 Transport Layer

  43. TCP: retransmission scenarios Host A Host A Host B Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=92 timeout Seq=100, 20 bytes data timeout ACK=100 X loss Seq=92, 8 bytes data Seq=92, 8 bytes data Sendbase = 100 Seq=92 timeout SendBase = 120 0 0 1 = K C A SendBase SendBase = 100 = 120 premature timeout time time lost ACK scenario 3-68 Transport Layer

  44. TCP retransmission scenarios (more) Host A Host B Seq=92, 8 bytes data ACK=100 timeout Seq=100, 20 bytes data X loss 0 2 SendBase 1 = K C A = 120 time Cumulative ACK scenario 3-69 Transport Layer

  45. TCP ACK generation [RFC 1122, RFC 2581] TCP Receiver action Event at Receiver Delayed ACK. Wait up to 500ms Arrival of in-order segment with for next segment. If no next segment, expected seq #. All data up to send ACK expected seq # already ACKed Immediately send single cumulative Arrival of in-order segment with ACK, ACKing both in-order segments expected seq #. One other segment has ACK pending Immediately send duplicate ACK , Arrival of out-of-order segment indicating seq. # of next expected byte higher-than-expect seq. # . Gap detected Immediate send ACK, provided that Arrival of segment that segment starts at lower end of gap partially or completely fills gap 3-70 Transport Layer

  46. Fast Retransmit } time-out period often relatively long: } If sender receives 3 ACKs for same data, it assumes } long delay before resending lost packet that segment after ACKed } detect lost segments via data was lost: duplicate ACKs. } fast retransmit: resend segment before timer expires } sender often sends many segments back-to-back } if segment is lost, there will likely be many duplicate ACKs for that segment 3-71 Transport Layer

  47. Host A Host B seq # x1 seq # x2 seq # x3 ACK x1 X seq # x4 seq # x5 ACK x1 ACK x1 ACK x1 triple duplicate ACKs resend seq X2 timeout time 3-72 Transport Layer

  48. Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for fast retransmit already ACKed segment 3-73 Transport Layer

  49. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-74 Transport Layer

  50. TCP Flow Control flow control sender won’t overflow } receive side of TCP receiver’s buffer by connection has a receive transmitting too much, buffer: too fast } speed-matching service: (currently) application IP TCP data unused buffer matching send rate to (in buffer) process datagrams space receiving application’s drain rate ❒ app process may be slow at reading from buffer 3-75 Transport Layer

  51. TCP Flow control: how it works } receiver: advertises unused (currently) application IP TCP data unused buffer buffer space by including process (in buffer) datagrams space rwnd value in segment header rwnd RcvBuffer } sender: limits # of (suppose TCP receiver discards unACKed bytes to rwnd out-of-order segments) } guarantees receiver’s buffer } unused buffer space: doesn’t overflow = rwnd = RcvBuffer-[LastByteRcvd - LastByteRead] 3-76 Transport Layer

  52. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-77 Transport Layer

  53. TCP Connection Management Three way handshake: Recall: TCP sender, receiver establish “connection” before Step 1: client host sends TCP SYN exchanging data segments segment to server } initialize TCP variables: } specifies initial seq # } seq. #s } no data } buffers, flow control info (e.g. Step 2: server host receives SYN, RcvWindow ) replies with SYNACK segment } client: connection initiator } server allocates buffers Socket clientSocket = new Socket("hostname","port } specifies server initial seq. # number"); Step 3: client receives SYNACK, } server: contacted by client replies with ACK segment, which Socket connectionSocket = may contain data welcomeSocket.accept(); 3-78 Transport Layer

  54. TCP Connection Management (cont.) Starting a connection: client server open client opens a Socket: SYN three-way handshake Seq# = X open Step 1: client end system sends K C A , 1 N + Y X S = # k c TCP SYN control segment to A , Y = # q e S server Step 2: server receives SYN, ACK Seq# = X+1, Ack# = Y+1 replies with SYN, ACK Step 3: client replies with ACK opened NO DATA SENT UNTIL HANDSHAKE IS COMPLETED 3-79 Transport Layer

  55. TCP Connection Management (cont.) Closing a connection: client server client closes socket: close FIN clientSocket.close(); Step 1: client end system sends K C A TCP FIN control segment to close server N I F Step 2: server receives FIN, replies timed wait with ACK. Closes connection, ACK sends FIN. closed 3-80 Transport Layer

  56. TCP Connection Management (cont.) Step 3: client receives FIN, replies client server with ACK. closing FIN } Enters “timed wait” - will respond with ACK to received FINs K C A closing Step 4: server, receives ACK. N I F Connection closed. timed wait Note: with small modification, ACK can handle simultaneous FINs. closed closed 3-81 Transport Layer

  57. TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle 3-82 Transport Layer

  58. 3-83 Source: TCP/IP Illustrated, Volume1

  59. TCP Syn and Security } Port Scanning } Send Syn packets to many ports } Enumerate open ports } “Fingerprint” listening applications to find vulnerabilities } Syn flood } DDoS attack (attack on Availability) } Exhausts resources } Can be prevented using “Syn cookies” } Serv_Sin_Seq# = H(C_Seq, SrcPort, SrcIP , DstPort, DstIP , Salt) 3-84 Transport Layer

  60. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-85 Transport Layer

  61. Principles of Congestion Control Congestion: } informally: “too many sources sending too much data too fast for network to handle” } different from flow control! } manifestations: } lost packets (buffer overflow at routers) } long delays (queueing in router buffers) } a top-10 problem! 3-86 Transport Layer

  62. Approaches towards congestion control two broad approaches towards congestion control: network-assisted congestion end-end congestion control: control: } no explicit feedback from network } routers provide feedback to end systems } congestion inferred from end- system observed loss, delay } single bit indicating } approach taken by TCP congestion (SNA, DECbit, TCP/IP ECN, ATM) } explicit rate sender should send at 3-87 Transport Layer

  63. Chapter 3 outline } 3.1 Transport-layer services } 3.5 Connection-oriented } 3.2 Multiplexing and transport: TCP demultiplexing } segment structure } 3.3 Connectionless } reliable data transfer transport: UDP } flow control } 3.4 Principles of reliable data } connection management transfer } 3.6 Principles of congestion control } 3.7 TCP congestion control 3-88 Transport Layer

  64. TCP congestion control: ❒ goal: TCP sender should transmit as fast as possible, but without congesting network ❍ Q: how to find rate just below congestion level ❒ decentralized: each TCP sender sets its own rate, based on implicit feedback: ❍ ACK: segment received (a good thing!), network not congested, so increase sending rate ❍ lost segment: assume loss due to congested network, so decrease sending rate 3-89 Transport Layer

  65. TCP congestion control: bandwidth probing ❒ “probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate ❍ continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, X loss, so decrease rate so increase rate X X sending rate X TCP’s X “sawtooth” behavior time ❒ Q: how fast to increase/decrease? ❍ details to follow 3-90 Transport Layer

  66. TCP Congestion Control: details } sender limits rate by limiting number of unACKed bytes “in pipeline”: LastByteSent-LastByteAcked ≤ cwnd } cwnd: differs from rwnd (how, why?) } sender limited by min(cwnd,rwnd ) } For simplicity we assume cwnd << rwnd cwnd bytes } roughly, cwnd rate = bytes/sec RTT } cwnd is dynamic, function of perceived RTT network congestion ACK(s) 3-91 Transport Layer

  67. TCP Congestion Control: more details segment loss event: reducing ACK received: increase cwnd cwnd } timeout: no response from ❒ slowstart phase: receiver ❍ increase exponentially fast (despite name) at } cut cwnd to 1 MSS connection start, or } 3 duplicate ACKs: at least following timeout some segments getting ❒ congestion avoidance: through (recall fast ❍ increase linearly retransmit) } cut cwnd in half, less aggressively than on timeout 3-92 Transport Layer

  68. TCP Slow Start } when connection begins, cwnd = 1 MSS Host A Host B } example: MSS = 500 bytes & RTT = 200 msec one segment RTT } initial rate = 20 kbps } available bandwidth may be >> MSS/ two segments RTT } desirable to quickly ramp up to respectable rate four segments } increase rate exponentially until first loss event or when threshold reached } double cwnd every RTT } done by incrementing cwnd by 1 time for every ACK received 3-93 Transport Layer

  69. Transitioning into/out of slowstart ssthresh: cwnd threshold maintained by TCP } on loss event: set ssthresh to cwnd/2 } remember (half of) TCP rate when congestion last occurred } when cwnd >= ssthresh : transition from slowstart to congestion avoidance phase duplicate ACK new ACK dupACKcount++ cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed Λ cwnd = 1 MSS Increase cwnd > ssthresh ssthresh = 64 KB congestion dupACKcount = 0 slow Λ rate more avoidance start cautiously timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 timeout retransmit missing segment ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment 3-94 Transport Layer

  70. TCP: congestion avoidance AIMD } when cwnd > ssthresh grow cwnd linearly ❒ ACKs: increase cwnd } increase cwnd by 1 MSS by 1 MSS per RTT: per RTT additive increase } approach possible ❒ loss: cut cwnd in half congestion slower than in (non-timeout-detected slowstart loss ): multiplicative } implementation: cwnd = decrease cwnd + MSS/cwnd for AIMD: Additive Increase each ACK received Multiplicative Decrease 3-95 Transport Layer

  71. TCP congestion control FSM: overview Fast Recovery introduced in 1997 (RFC2001) cwnd > ssthresh slow congestion start avoidance loss: timeout loss: timeout loss: loss: new ACK timeout 3dupACK fast loss: recovery 3dupACK 3-96 Transport Layer

  72. TCP congestion control FSM: details . new ACK duplicate ACK cwnd = cwnd + MSS (MSS/cwnd) new ACK dupACKcount++ dupACKcount = 0 cwnd = cwnd+MSS transmit new segment(s),as allowed dupACKcount = 0 transmit new segment(s),as allowed Λ cwnd = 1 MSS cwnd > ssthresh ssthresh = 64 KB slow dupACKcount = 0 congestion Λ start avoidance timeout ssthresh = cwnd/2 cwnd = 1 MSS duplicate ACK dupACKcount = 0 timeout dupACKcount++ retransmit missing segment ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 New ACK dupACKcount = 0 cwnd = ssthresh retransmit missing segment dupACKcount == 3 dupACKcount == 3 dupACKcount = 0 ssthresh= cwnd/2 ssthresh= cwnd/2 cwnd = ssthresh + 3 cwnd = ssthresh + 3 retransmit missing segment retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed 3-97 Transport Layer

  73. Popular “flavors” of TCP Triple duplicate ACKs event Incorporates cwnd window size (in segments) TCP Reno Fast Recovery ssthresh ssthresh TCP Tahoe Transmission round 3-98 Transport Layer

  74. Some more details on the slow start phase } Things have changed since 1989 or 1997 1989: RFC1122 – “Requirements for Internet Hosts -- Communication Layers” } 1997: RFC2001 – “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms” } RFC 3390 Increasing TCP's Initial Window October 2002 The upper bound for the initial window is given more precisely in (1): min (4*MSS, max (2*MSS, 4380 bytes)) (1) Note: Sending a 1500 byte packet indicates a maximum segment size (MSS) of 1460 bytes (assuming no IP or TCP options). Therefore, limiting the initial window's MSS to 4380 bytes allows the sender to transmit three segments initially in the common case when using 1500 byte packets. Note: some applications cheat on the slow start! http://blog.benstrong.com/2010/11/google-and-microsoft-cheat-on-slow.html 3-99

  75. Other cases of “cheating” } Optimizations in Google Chrome: } Goal: make pages load as fast as possible } Reduce loading time to “instantaneous” } http://www.igvita.com/posa/high-performance-networking-in- google-chrome/ 3-100 Transport Layer

Recommend


More recommend