End-to-End Protocols End-to-End Protocols � Limitations of Underlying Best-effort Network (e.g., Internet) � UDP (User Datagram Protocol) � drop messages � re-orders messages � delivers duplicate copies of a given message � TCP (Transport Control Protocol) � limits messages to some finite size � Connection Establishment/Termination � delivers messages after an arbitrarily long delay � Sliding Window Revisited � Common Properties of the End-to-End Protocols � guarantee message delivery � Flow Control � deliver messages in the same order they are sent � Adaptive Retransmission � deliver at most one copy of each message � support arbitrarily large messages � support synchronization � allow the receiver to flow control the sender � support multiple application processes on each host EE4272 Spring 2004 EE4272 Spring 2004 Simple Processes Demultiplexor (UDP) � Unreliable and unordered datagram service � Adds multiplexing � No flow control /UDP /UDP � Endpoints (target process) identified by ports � servers have well-known ports � see /etc/services on Unix � Unique on a single host � Implementation diff. on OSs � Header format � Optional checksum (IPv4) EE4272 Spring 2004 EE4272 Spring 2004 1
End-to-End Issues:Data Link Vs Transport TCP Overview � Flow control: keep sender from � Core of TCP: sliding window algorithm � Reliable, Connection-oriented overrunning receiver ; an end-to-end � Potentially connects many different hosts issue (“how much”) � Byte-stream � need explicit connection establishment and termination � app writes bytes � Congestion control: keep sender from � TCP sends segments overrunning network; concerned with how � Potentially different RTT hosts and networks interact (“how fast”) � app reads bytes � need adaptive timeout mechanism � TCP frees the application from the worry � Full duplex � Potentially long delay in network of missing or reordered data � need to be prepared for arrival of very old packets (MSL:120s) � Potentially different capacity at destination � need to accommodate different amounts of buffering � Potentially different network capacity � need to be prepared for network congestion (Chapter 6) � X.25 use sliding window on a hop-by-hop basis EE4272 Spring 2004 EE4272 Spring 2004 Segment Format Segment Format (cont) � Although TCP provide “ byte stream ” service to application processes, TCP � Each connection identified (TCP demux key) with 4-tuple : itself does not transmit individual bytes over the Internet, but a packet (segment) � (SrcPort, SrcIPAddr, DsrPort, DstIPAddr) exchanged between TCP peers � Sliding window + flow control � Trigger mechanisms: MSS without causing local IP to fragment ; TCP sender � acknowledgment, SequenceNum, AdvertisedWinow “push” operation (e.g., telnet); � Flags � SYN, FIN, RESET, PUSH, URG, ACK � Checksum � pseudo header + TCP header + data EE4272 Spring 2004 EE4272 Spring 2004 2
Heavy line: normal path for a client; Connection Establishment and Termination dashed line: normal path for server CLOSED Light line:unusual events 1 passive open, applic.cl create TCB Three way handshake algorithm ose active open receive SYN, 2 send SYN LISTEN send SYN, ACK (note: a timer is scheduled for each s e n receive d of the first 2 segments) S applic. close Y RST N or timeout receive SYN, SYN_SENT Q. : why exchange starting Seq. # SYN_RCVD send ACK r e c instead of staring from default e i v e A 3 A C K , C r e c e i v e S Y N , K number such as 0? applic. A C K s e n d receive FIN, close, ESTABLISHED (two incarnations) send ACK send applic. close, FIN send FIN CLOSE_WAIT receive FIN applic. close Passive close FIN_WAIT_1 CLOSING send ACK send FIN receive FIN, ACK receive send ACK LAST_ACK Note: TCP Connection setup is asymmetric , while connection tear down is ACK receive receive ACK symmetric; Once a connection setup, it is a bidirectional connection ACK Active close (2MSL period) receive FIN TIME_WAIT FIN_WAIT_2 send ACK EE4272 Spring 2004 EE4272 Spring 2004 Sliding Window Sliding Window: Sender � Basic Idea : � Assign sequence number to each frame ( SeqNum ) � Maintain three state variables: � Allow sender to transmit multiple frames before receiving an ACK , thereby keeping the pipe full. � send window size ( SWS ): upper bound on the # of outstanding (un- There is an upper limit (called window ) on the ACK) frames number of outstanding (un-ACKed) frames allowed. � sequence # of last acknowledgment received ( LAR ) � Size of window sets amount of data that can be sent w/o � sequence # of last frame sent ( LFS ) waiting for ACK from the receiver � Maintain invariant: LFS - LAR <= SWS at all time Sender Receiver 1 ≤ SWS 2 … … 3 … LAR LFS N … Time � Advance/update LAR when ACK arrives to allow a new frame be sent � Buffer up to SWS frames for retransmission if needed … EE4272 Spring 2004 EE4272 Spring 2004 3
TCP: Sliding Window Revisited Sliding Window: Receiver ARQ:Stop-and-wait � Maintain three state variables Sending application Receiving application -> Sliding Window � receive window size ( RWS ): upper bound on the # of out-of-order frames (Why?) (delay x bandwidth) � sequence # of largest acceptable frame ( LAF ) TCP TCP 1) reliable delivery; LastByteWritten LastByteRead � sequence # of last frame received ( LFR ) � Maintain invariant: LAF - LFR <= RWS 2) in order delivery; LastByteAcked LastByteSent NextByteExpected LastByteRcvd ≤ RWS 3) flow control Pointers (sequence numbers) … … � Sending side � Receiving side LFR LAF � LastByteAcked < = � LastByteRead < � Frame SeqNum arrives: LastByteSent NextByteExpected � if LFR < SeqNum < = LAF accept � NextByteExpected < = � LastByteSent < = � if SeqNum < = LFR or SeqNum > LFA discarded LastByteRcvd +1 (?!) LastByteWritten � Mechanism of Sending cumulative ACKs � buffer bytes between � buffer bytes between LastByteRead and � LFR = SeqNumtoAck (largest seq # not yet acknowledged) LastByteAcked and LastByteRcvd � LAF = LFR + RWS LastByteWritten EE4272 Spring 2004 EE4272 Spring 2004 Protection Against Wrap Around Flow Control Sending application Receiving application � TCP: 16bits for AdvertiseWindow, 32 bits for SeqNum TCP TCP � Send buffer size: MaxSendBuffer LastByteWritten LastByteRead � Sequence # space be at least twice of window size � Receive buffer size: MaxRcvBuffer � SeqNum not wrap around within MSL (120 seconds) � Receiving side LastByteAcked LastByteSent NextByteExpected LastByteRcvd � LastByteRcvd - LastByteRead < = MaxRcvBuffer � 32-bit SequenceNum � AdvertisedWindow = MaxRcvBuffer - ( LastByteRcvd - LastByteRead ) � Sending side Bandwidth Time Until Wrap Around � LastByteSent - LastByteAcked < = AdvertisedWindow T1 (1.5 Mbps) 6.4 hours � EffectiveWindow = AdvertisedWindow - ( LastByteSent - Ethernet (10 Mbps) 57 minutes LastByteAcked ) T3 (45 Mbps) 13 minutes � LastByteWritten - LastByteAcked < = MaxSendBuffer FDDI (100 Mbps) 6 minutes STS-3 (155 Mbps) 4 minutes � block sender if ( LastByteWritten - LastByteAcked ) + y > STS-12 (622 Mbps) 55 seconds MaxSenderBuffer STS-24 (1.2 Gbps) 28 seconds � Always send ACK (SWS + NextByteExpt) in response to arriving data segment � Persist when AdvertisedWindow = 0 (1 byte probe segment) EE4272 Spring 2004 EE4272 Spring 2004 4
Keeping the Pipe Full Adaptive Retransmission (Original Algorithm) � Delay x Bandwidth product � In Internet, “range” &” variation” of RTT could be big � 16-bit AdvertisedWindow (assume RTT of 100ms) � TCP retransmit if an ACK is not received within … time (as a function of RTT) Bandwidth Delay x Bandwidth Product � Measure SampleRTT for each segment/ ACK pair T1 (1.5 Mbps) 18KB � Compute weighted average of RTT Ethernet (10 Mbps) 122KB � EstRTT = α x EstRTT + β x SampleRTT T3 (45 Mbps) 549KB � where α + β = 1 FDDI (100 Mbps) 1.2MB � α between 0.8 and 0.9 STS-3 (155 Mbps) 1.8MB � β between 0.1 and 0.2 STS-12 (622 Mbps) 7.4MB � Set timeout based on EstRTT STS-24 (1.2 Gbps) 14.8MB � TimeOut = 2 x EstRTT � TCP Dynamics � TCP Extension � Advertised window � Adaptive Timeout EE4272 Spring 2004 EE4272 Spring 2004 Jacobson/ Karels Algorithm Karn/Partridge Algorithm � Karn/Partridge algorithm does not take the variance into account � New Calculations for average RTT � Diff = SampleRTT – EstRTT � EstRTT = EstRTT + ( δ x Diff) � Dev = Dev + δ ( |Diff| - Dev) � where δ is a factor between 0 and 1 � Consider variance when setting timeout value � Problem: Ack indicates the receipt (not transmission) of data � TimeOut = µ x EstRTT + φ x Dev � Do not sample RTT when retransmitting � where µ = 1 and φ = 4 � Notes � Double timeout after each retransmission (exponential � algorithm only as good as granularity of clock (500ms on Unix) backoff) : intuition? � accurate timeout mechanism important to congestion control (later) EE4272 Spring 2004 EE4272 Spring 2004 5
Recommend
More recommend