Analysis of Techniques to Improve Protocol Processing Latency David - PowerPoint PPT Presentation

Analysis of Techniques to Improve Protocol Processing Latency David Mosberger, Patrick Bridges, Larry L. Peterson, and Sean O’Malley The University of Arizona f davidm,bridges,llp g @cs.arizona.edu e-mail: sean@netapp.com www: http://www.cs.arizona.edu/scout SIGCOMM ’96 1

Latency: Where does it come from? � Speed of light � Data touching overheads? – No: messages (data) are small. � Execution overheads? – Too much code. – Badly structured code. SIGCOMM ’96 The University of Arizona 2

Test Environment � Protocol stacks XRPCTEST MSELECT – TCP/IP VCHAN – RPC TCPTEST CHAN TCP BID � Hardware platform BLAST IP – 175MHz Alpha IP VNET – 100MB/s memory VNET ETH – TURBOchannel bus ETH – 10Mbps Ethernet LANCE LANCE SIGCOMM ’96 The University of Arizona 3

Starting Point � Data cache footprint cycle count 20000 18000 – padding 16000 14000 18941 12000 10000 15688 – stack switching 8000 6000 4000 2000 – info duplication 0 Orig Opt instruction count � Tiny functions 6000 5000 � Machine idiosyncracies 5821 4000 4750 3000 2000 – byte load/store 1000 0 – integer division Orig Opt SIGCOMM ’96 The University of Arizona 4

How fast is TCP/IP? other TCP input tcp_input BSD/386 other IP input ipintr DUX/Alpha � xk/Alpha 0 250 500 750 1000 1250 1500 instruction count SIGCOMM ’96 The University of Arizona 5

Latency Bottlenecks Suspects � Frequent branching � Instruction-cache gaps � Cache collisions � Layering overheads Not instruction/data translation buffer. SIGCOMM ’96 The University of Arizona 6

Techniques � Outlining attacks: – frequent branching – i-cache gaps � Cloning attacks: – cache collisions � Path-inlining attacks: – layering overheads SIGCOMM ’96 The University of Arizona 7

Outlining � Exception-handling code – lots of it (up to 50%) – dilutes instruction-cache – causes taken branches � Remove from fast path – annotate if-statements with branch probability – move unlikely code to end of function SIGCOMM ’96 The University of Arizona 8

Outlining Example : f if (bad case @ 0) panic("ba d day"); g printf("g oo d day"); : : load r0, (bad case) : jump if not 0 r0, bad day load r0, (bad case) load addr a0, "good day" jump if 0 r0, good day call printf load addr a0, "bad day" continue : call panic : good day: return load addr a0, "good day" bad day: call printf load addr a0, "bad day" : call panic jump continue SIGCOMM ’96 The University of Arizona 9

Cloning � Make copy of functions on fast path – relocate to avoid conflict misses – specialize for a particular use (partial evaluation) � Alternative layout algorithms – micro-positioning – bipartite layout SIGCOMM ’96 The University of Arizona 10

Outlining & Cloning Summary Standard Layout: After Outlining: After Cloning: function A function A function A function B function B function B copy & relocate frequently executed code clone A frequently executed instructions infrequently executed instructions clone B SIGCOMM ’96 The University of Arizona 11

Path-Inlining Collapse deeply-nested functions � Assume fast path is known � Compile entire function as single unit Advantages � Removes call-overheads � Increases context for optimizer SIGCOMM ’96 The University of Arizona 12

End-to-End Latency Roundtrip time in � s: TCP RPC 500 500 400 400 498.8 300 300 457.1 399.2 365.5 200 200 351 310.8 100 100 0 0 BAD STD OPT BAD STD OPT SIGCOMM ’96 The University of Arizona 13

Processing Latency Processing-time per roundtrip in � s: RPC TCP 300 300 288.8 247.1 200 200 189.2 155.5 141 100 100 100.8 0 0 BAD STD OPT BAD STD OPT SIGCOMM ’96 The University of Arizona 14

Memory System Performance TCP RPC BAD BAD 1.61 4.58 1.69 4.66 DUX DUX 2 2.3 mCPI mCPI STD STD 1.72 1.58 1.78 1.69 iCPI iCPI OPT OPT 1.57 1.17 1.67 0.81 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 SIGCOMM ’96 The University of Arizona 15

Outlining Effectiveness � TCP No Outlining With Outlining Used Used Unused Unused 21% 15% 85% 79% � RPC – Essentially identical performance. SIGCOMM ’96 The University of Arizona 16

Conclusions � Instruction cache bandwidth major bottleneck � Cache collisions not particularly bad � Processor/Memory gap still growing; now: – 300MHz processor – 100Mbps Ethernet – 80MB/s memory system SIGCOMM ’96 The University of Arizona 17

Conclusions � Outlining – Readily applicable – Relatively convenient � Cloning and path-inlining – Requires “path” notion: see Scout OS – Need better (automatic) tools SIGCOMM ’96 The University of Arizona 18

Dynamics xCall() XRPCTEST MSELECT c VCHAN TCPTEST semWait() CHAN semSignal() TCP BID a BLAST IP IP b VNET VNET ETH ETH processFrame() LANCE LANCE SIGCOMM ’96 The University of Arizona 19

Analysis of Techniques to Improve Protocol Processing Latency David - PowerPoint PPT Presentation

Analysis of Techniques to Improve Protocol Processing Latency David Mosberger, Patrick Bridges, Larry L. Peterson, and Sean OMalley The University of Arizona f davidm,bridges,llp g @cs.arizona.edu e-mail: sean@netapp.com www:

EFFICIENCY OF THE BASIC EMDR PROTOCOL COMPARED TO A RESOURCE PROTOCOL ROLE OF EYE MOVEMENTS IN A

Forest Protocol Forest Protocol Protocol Update Effort Protocol Update Effort Goals and

DISC- Improv to Improve DISC- Improv to Improve DISC- Improv to Improve DISC- Improv to Improve

Attacks on TCP 1 Outline What is TCP protocol? How the TCP Protocol Works SYN

Internetworking Internetworking Address Resolution Protocol Address Resolution Protocol z

User Datagram Datagram Protocol (UDP) Protocol (UDP) User Srinidhi Varadarajan UDP: The User

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571

Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571

FIX Protocol 101 An Introduction To the FIX Protocol Martin Koopman, Former Chair FIX Protocol

SMB3 Protocol Update Tom Talpey Microsoft Corporation 1 Outline SMB3 Protocol changes

Protocol Attacks What is a protocol attack? How does it work? Different types of

PPSP Peer Protocol draft-gu-ppsp-peer-protocol PPSP WG IETF 82 Taipei Rui Cruz (presenter)

Dont Hold My Data Hostage A Case For Client Protocol Redesign What is a Client Protocol

MAT 166 Calculus for Bus/Soc Chapter 7 Notes Antiderivatives Integration David J. Gisch

Implementing Graphs Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

Auctions as Games: Equilibria and Efficiency Near-Optimal Mechanisms va Tardos, Cornell Games

On the accuracy of dG discretizations on curved and agglomerated elements meshes 1 1 1 Lorenzo

CS302: Paradigms of Programming Logic Paradigm (Cont.) Manas Thakur Feb-June 2020 From the

Production After Thinning in Bottomland Hardwood Stands in the Southern United States Steve

ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring

Treasury (spending / accessing funds, A-Board (requesting funds) SUMS technical support) Form