dns performance and the effectiveness of caching
play

DNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil - PowerPoint PPT Presentation

DNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari Balakrishnan, Robert Morris Presenter: Gigis Petros Introduction Two factors contribute to the scalability of DNS hierarchical design around delegated name


  1. DNS Performance and the Effectiveness of Caching Jaeyeon Jung, Emil Sit, Hari Balakrishnan, Robert Morris Presenter: Gigis Petros

  2. Introduction • Two factors contribute to the scalability of DNS • hierarchical design around delegated name spaces • aggressive use of caching • Reduce the load on the root servers • Successful caching hopes to limit client-perceived delays and wide-area bandwidth usage

  3. Motivation • What performance, in terms of latency and failures, do DNS clients perceive? • How does varying the TTL and degree of cache sharing impact caching effectiveness?

  4. DNS overview • Mapping human-readable host names to IP addresses • Reverse mapping & mail-routing information • Mapping in the DNS name space are called resource records • A record: name’ s IP address • NS record: name of DNS server • Caching in DNS • Time To Live: expiration time by the originator of a neem • Negative caching • Caches work well because DNS changes slowly

  5. Terminology • A lookup refers to the entire process of translating a domain name • A query refers to a DNS request packet sent to a DNS server • A response refers to a packet sent by DNS server in reply to a query packet • An answer is a response from a DNS server that terminates the lookup (successfully or unsuccessfully)

  6. DNS Lookup Sequence

  7. Questions • What is the ratio of TCP connections to DNS A record lookups? • What is the number of DNS queries per lookup? • DNS errors • What percentage of lookups do never get an answer? • Performance of retransmission protocol • What is the effect of varying TTLs and degrees of caching sharing on cache hit rate?

  8. Key Findings • TCP / DNS lookup ratio suggests that the hit rate of DNS caches inside MIT is between 70% and 80% • DNS queries per lookup • Unanswered produces 10 query packets • Answered 1.3 query packets • 23% of all client lookups in the most recent MIT trace fail to elicit any answer • 13% of lookups result in an answer that indicates a failure. Most of these failures indicate NXDOMAIN • % of TCP connections made to names with low TTL values increased from 12% to 25% in 2000 • Setting all A-record TTL’s to a value as small as 10 minutes is not likely to degrade the scalability of DNS

  9. MIT Dataset • MIT’s laboratory for Computer science (LCS) and Artificial Intelligence laboratory (AI) to the rest internet • 24 internal subnetworks sharing the border router • Data collected in January and December 2000 • 500 users, 1200 hosts

  10. KAIST Dataset • Korea Advanced Institute of Science and Technology (KAIST) to the rest internet • Collected May 2001 • 1000 users, 5000 hosts • Only International TCP traffic

  11. Methodology • Collection Methodology • Wide-area DNS query/response • Outgoing TCP connections: SYN/FIN/RST • Anonymized internal addresses • Analysis of Methodology • Sliding window of 60 seconds • A referral occurs when a server does not know the answer to a query, but does know where the answer can be found

  12. Data Summary • Categorize lookups 1. Negative answer: lookup gets a response with non- zero response code 2. Zero answer: is authoritative and indicates no error, but has no ANSWER, AUTHORITY or ADDITIONAL records 3. Answered with success: terminates with a response that has a NOERROR code and one or more ANSWER • All other lookups are considered unanswered

  13. Data Summary

  14. Latency Latency distribution vs. number of referrals for the mit-dec00 trace • Latency is affected by the number of referrals

  15. Latency Distribution of latencies for lookups that do and do not involve querying root servers. • Cached NS records reduce the load on root servers

  16. Retransmissions • A querying name server retransmits a query if it does not get a response from destination within a timeout period • Unanswered lookups categories • Zero referrals (nothing received) • Non-zero referrals (not lead to answer) • Loops (misconfigured information)

  17. Retransmissions Cumulative distribution of number of retransmissions for answered (top most curved) and unanswered lookups • 99.9% of answered lookups have <= 2 retransmissions

  18. Retransmissions • Each lookup that elicited zero referrals generated about five times as many wide-area query packets • We conclude that many DNS name servers are too persistent in their retry strategies • Results show that it is better for them to give up after 2 or 3 retransmissions and let client program decide • Loops generated on average about 10 query packets

  19. Retransmissions • Query packets generated by lookups that obtained no answer • - 59% (mit-jan00) • 63% (mit-dec00) • Names servers are using inappropriate setting for the number of retries or excessive timeout value • Nore transmissions within 60 seconds: • 12% (mit-jan00), 19% (mit-dec00)

  20. Failures • Negative answers are mostly errors of NXDOMAIN or SERVFALL • NXDOMAIN: request name doesn’t exist • SERVFALL: supposed to be authoritative but does not have a valid copy or is out of memory • The largest cause of these error responses are inverse lookups for IP addresses • Non-existent top-level domains: loopback, index.htm • Large number of distinct names makes negative caching ineffective

  21. Interactions with Root Servers • 15 - 27% of the lookups sent to root name servers resulted in negative responses • Most of these appear to be mistyped names The percentages are of the total number of lookups in the trace

  22. Effectiveness of DNS Caching • How useful is it to share DNS caches among many client machines? • Locality of references among clients • What is the likely impact of choice of TTL on caching effectiveness? • Locality of references in time • Quantify two important statistics 1. Distribution of name popularity 2. The distribution of TTL values

  23. Name Popularity • The top 10% account for more than 68% of total lookups • A long tail : 9.0% are unique names

  24. TTL Distribution • The fraction of accesses to short TTLs has doubled • Increased deployment of DNS-based server selection

  25. Trace-driven Simulation • Two databases: • “name database” maps every IP in A answer to the domain name • “TTL database” maps each domain name to the highest TTL A record for that domain • Algorithm • Randomly divide TCP clients into groups of size s • For each new TCP connection, determine the group g and look for a name n in the cache of group g • If n exists and the cached TTL has not expired, record a hit. Otherwise record a miss

  26. Effect of Sharing on Hit rates • Most of the benefit of sharing is obtained with as few as 10 or 20 clients per cache

  27. Impact of TTL on Hit rates • Most of the benefit of caching is achieved with TTLs less than about 1000 seconds. • 5-min TTLs would increase DNS traffic by factor of 1.5 • NS record caching is critical

  28. Conclusions 1. About a quarter of all DNS lookups never get an answer , which corresponds over 50% of DNS packets in the wide-area Internet 2. The DNS retransmission protocol appears to be overly persistent , but in 10%-12% of cases, no retransmissions occur 3. Setting all A-record TTL’s to a value as small as 10 minutes is not likely to degrade the scalability of DNS in any noticeable way 4. The cache ability of NS records enhances scalability by reducing load on the root and top-level name servers 5. Little benefit is obtained from sharing a forwarding DNS cache among than 10 - 20 clients

Recommend


More recommend