advanced network performance monitoring and
play

Advanced Network Performance Monitoring and Troubleshooting - PowerPoint PPT Presentation

Advanced Network Performance Monitoring and Troubleshooting Richard Carlson March 5, 2009 rcarlson@internet2.edu Basic Premise Applications performance should meet your expectations! If they dont you should complain! But


  1. Advanced Network Performance Monitoring and Troubleshooting Richard Carlson March 5, 2009 rcarlson@internet2.edu

  2. Basic Premise • Application’s performance should meet your expectations! • If they don’t you should complain! • But – you need to complain effectively!

  3. Why is it hard to Find/Fix Problems? Network infrastructure is complex Network infrastructure is shared Network infrastructure consists of multiple components

  4. Example 1 – SCP file transfer Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take? • 5 minutes? • 1 minute? • 5 seconds?

  5. What should we expect? Assumptions: • 100 Mbps Fast Ethernet is the slowest link • 50 msec round trip time Bob & Carol calculate: • 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds

  6. Initial SCP Test Results

  7. Initial Test Results This is unacceptable! First look for network infrastructure problem • Use NDT tester to examine both hosts

  8. Initial NDT testing shows Duplex Mismatch at one end

  9. NDT Found Duplex Mismatch Investigating this it is found that the switch port is configured for 100 Mbps Full- Duplex operation. • Network administrator corrects configuration and asks for re-test

  10. Duplex Mismatch Corrected

  11. SCP results after Duplex Mismatch Corrected

  12. Intermediate Results Time dropped from 18 minutes to 40 seconds. But our calculations said it should take 4 seconds! • 400 Mb / 40 sec = 10 Mbps • Why are we limited to 10 Mbps? • Are you satisfied with 1/10 th of the possible performance?

  13. Default TCP window settings

  14. Calculating the Window Size Remember Bob found the round-trip time was 50 msec Calculate window size limit • 85.3KB * 8 b/B = 698777 b • 698777 b / .050 s = 13.98 Mbps Calculate new window size • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB • Use 1MB as a minimum

  15. Resetting Window Value

  16. With TCP windows tuned

  17. Steps so far Found and fixed Duplex Mismatch • Network Infrastructure problem Found and fixed TCP window values • Host configuration problem Are we done yet?

  18. SCP results with tuned windows

  19. Intermediate Results SCP still runs slower than expected • Hint: SCP uses internal buffers • Patch available from PSC

  20. SCP Results with tuned SCP

  21. Final Results Fixed infrastructure problem Fixed host configuration problem Fixed Application configuration problem • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles

  22. Example 2 - PNNL Throughput Problem 950+ Mbps from remote sites to PNNL 966 Mbps 930 Mbps 328 Mbps Measured Speeds shows problem when PNNL sends 22

  23. PNNL Throughput Problem 950+ Mbps from remote sites to PNNL 966 Mbps 6 msec 930 Mbps 23 msec 328 Mbps 76 msec Interesting: RTT increases by a factor of 3 and speed decreases by the same factor 23

  24. PNNL Throughput Problem 950+ Mbps from remote sites to PNNL 966 Mbps 6 msec 0.0094% 6.04% ooo 930 Mbps 23 msec 0.0045% 328 Mbps 5.5% ooo 76 msec 0.0049% 5.15% ooo Finally: look at loss rate and packet reordering (ooo) rate, problem exists in Seattle – PNNL metro net 24

  25. Advanced user tools • Existing NDT tool • Allows users to test network path for a limited number of common problems • Existing NPAD tool • Allows users to test local network infrastructure while simulating a long path

  26. Network Diagnostic Tool (NDT) • Measure performance to users desktop • Identify real problems for real users • Network infrastructure is the problem • Host tuning issues are the problem • Make tool simple to use and understand • Make tool useful for users and network administrators

  27. NDT sample Results

  28. Finding a Server • What? You don’t have one running at your site? • Install the Internet2 Network Performance Toolkit Knoppix Disk

  29. NPAD/pathdiag • A new tool from researchers at Pittsburgh Supercomputer Center • Finds problems that affect long network paths • Uses Web100-enhanced Linux based server • Web based Java client

  30. Long Path Problem 70 msec H1 – H3 1 msec H1 – H2 H2 Switch 2 Switch 3 R5 R4 R8 Switch 1 R1 X R3 R6 H3 R9 R2 H1 R7 Switch 4

  31. NPAD Server main page

  32. NPAD Sample results

  33. Finding a Server • What? You don’t have one running at your site? • Install the Internet2 Network Performance Toolkit Knoppix Disk

  34. Sample BWCTL results

  35. OWping Results

  36. NPToolkit Knoppix Disk

  37. Conclusions • OSG VDT will contain client tools • Network operators (campus, regional, national) are standing up servers • OSG site admins need to stand up server ‘near’ cluster

Recommend


More recommend