11/10/08 Today P561: Network Systems Finding content and services Week 7: Finding content • Infrastructure hosted (DNS) • Peer-to-peer hosted (Napster, Gnutella, DHTs) Multicast Multicast: one to many content dissemination • Infrastructure (IP Multicast) Tom Anderson • Peer-to-peer (End-system Multicast, Scribe) Ratul Mahajan TA: Colin Dixon 2 Names and addresses Naming in systems Ubiquitous 33¢ − Files in filesystem, processes in OS, pages on the Web name Ratul Mahajan Decouple identifier for object/service from location Microsoft Research address − Hostnames provide a level of indirection for IP Redmond addresses Naming greatly impacts system capabilities and Names: identifiers for objects/services (high level) performance Addresses: locators for objects/services (low level) − Ethernet addresses are a flat 48 bits Resolution: name address • flat any address anywhere but large forwarding tables But addresses are really lower-level names − IP addresses are hierarchical 32/128 bits − e.g., NAT translation from a virtual IP address to physical IP, • hierarchy smaller routing tables but constrained locations and IP address to MAC address 3 4 Key considerations Internet hostnames Human-readable identifiers for end-systems For the namespace Based on an administrative hierarchy • Structure − E.g., june.cs.washington.edu, www.yahoo.com For the resolution mechanism − You cannot name your computer foo.yahoo.com • Scalability • Efficiency In contrast, (public) IP addresses are a fixed-length • Expressiveness binary encoding based on network position • Robustness − 128.95.1.4 is june’s IP address, 209.131.36.158 is one of www.yahoo.com’s IP addresses − Yahoo cannot pick any address it wishes 5 6 1
11/10/08 Original hostname system Problems with the original system When the Internet was really young … Coordination Flat namespace − Between all users to avoid conflicts − E.g., everyone likes a computer named Mars − Simple (host, address) pairs Inconsistencies Centralized management − Between updated and old versions of file − Updates via a single master file called HOSTS.TXT Reliability − Manually coordinated by the Network Information Center (NIC) − Single point of failure Resolution process Performance − Look up hostname in the HOSTS.TXT file − Competition for centralized resources − Works even today: (c:/WINDOWS/system32/ drivers)/etc/hosts 7 8 Domain Name System (DNS) DNS Hierarchy Developed by Mockapetris and Dunlap, mid-80’s Namespace is hierarchical − Allows much better scaling of data structures − e.g., root edu washington cs june com edu mil org au Namespace is distributed − Decentralized administration and access − e.g., june managed by cs.washington.edu uw yahoo Resolution is by query/response • “dot” is the root of the hierarchy ee cs − With replicated servers for redundancy www • Top levels now controlled by ICANN − With heavy use of caching for performance • Lower level control is delegated • Usage governed by conventions june • FQDN = Fully Qualified Domain Name 9 10 Name space delegation DNS resolution Each organization controls its own name space Root Reactive 2 name cicada.cs.princeton.edu (“zone” = subtree of global tree) server Queries can be recursive or iterative princeton.edu, 128.196.128.233 − each organization has its own nameservers Uses UDP (port 53) 3 • replicated for availability 4 − nameservers translate names within their organization 1 Local cicada.cs.princeton.edu Princeton cicada.cs.princeton.edu name name • client lookup proceeds step-by-step Client server server 192.12.69.60 cs.princeton.edu, 192.12.69.5 − example: washington.edu 8 5 cicada.cs.princeton.edu • contains IP addresses for all its hosts (www.washington.edu) cicada.cs.princeton.edu, 192.12.69.60 • contains pointer to its subdomains (cs.washington.edu) 6 CS name 7 server 11 12 2
11/10/08 Hierarchy of nameservers DNS performance: caching DNS query results are cached at local proxy − quick response for repeated translations Root name server − lookups are the rare case − vastly reduces load at the servers − what if something new lands on slashdot? Princeton … Cisco 1 Local cicada.cs.princeton.edu name server name server Client name server 192.12.69.60 cicada.cs.princeton.edu 2 (if cicada is cached) 2 (if cs is cached) cicada.cs.princeton.edu, 4 (if cs is cached) 192.12.69.60 CS … EE name server name server CS name 3 server 13 14 DNS cache consistency DNS cache effectiveness How do we keep cached copies up to date? − DNS entries are modified from time to time • to change name IP address mappings • to add/delete names Cache entries invalidated periodically − each DNS entry has time-to-live (TTL) field: how long can the local proxy can keep a copy − if entry accessed after the timeout, get a fresh copy from the server − how do you pick the TTL? − how long after a change are all the copies updated? Traffic seen on UW’s access link in 1999 15 16 Negative caching in DNS DNS traffic in the wide-area Pro: traffic reduction • Misspellings, old or non-existent names • “Helpful” client features % of DNS Study packets Con: what if the host appears? Danzig, 1990 14% Danzig, 1992 8% Frazer, 1995 5% Status: Thomson, 1997 3% • Optional in original design • Mandatory since 1998 17 18 3
11/10/08 DNS bootstrapping DNS root servers Need to know IP addresses of root servers before we can make any queries Addresses for 13 root servers ([a-m].root- servers.net) handled via initial configuration • Cannot have more than 13 root server IP addresses 123 servers as of Dec 2006 19 20 DNS availability Building on the DNS What happens if DNS service is not working? Email: ratul@microsoft.com DNS servers are replicated − DNS record for ratul in the domain microsoft.com, specifying where to deliver the email − name service available if at least one replica is Uniform Resource Locator (URL) names for Web working pages − queries load balanced between replicas − e.g., www.cs.washington.edu/homes/ratul − Use domain name to identify a Web server 2 name name name cicada.cs.princeton.edu server server server − Use “/” separated string for file name (or script) on princeton.edu, 128.196.128.233 the server 3 21 22 DNS evolution DNS properties (summary) Static host to IP mapping − What about mobility (Mobile IP) and dynamic address assignment (DHCP)? Nature of the namespace Hierarchical; flat at each level − Dynamic DNS Scalability of resolution High Location-insensitive queries Efficiency of resolution Moderate • Many servers are geographically replicated • E.g., Yahoo.com doesn’t refer to a single machine or even a single Expressiveness of queries Exact matches location; want closest server • Next week Robustness to failures Moderate Security (DNSSec) Internationalization 23 24 4
11/10/08 Peer-to-peer content sharing Napster (directory-based) Want to share content among large number of Centralized directory of all users offering each file users; each serves a subset of files Users register their files − need to locate which user has which files Users make requests to Napster central Napster returns list of users hosting requested file Question: Would DNS be a good solution for this? Direct user-to-user communication to download files 25 26 Naptser illustration Naptser vs. DNS Napster DNS 2 . D o “ e F s o o a ” n s F y Hierarchical; flat r i o e g n t h e Nature of the namespace Multi-dimensional h t g e h F i r a at each level s v o ” e o ? F “ e v a 3. Bob has it h I Scalability Moderate High . 1 Efficiency of resolution High Moderate 4. Share “Foo Fighters”? Expressiveness of queries High Exact matches 5. There you go Robustness to failures Low Moderate 27 28 Gnutella (crawl-based) Gnutella illustration Can we locate files without a centralized directory? − for legal and privacy reasons Gnutella − organize users into ad hoc graph − flood query to all users, in breadth first search • use hop count to control depth − if found, server replies back through path of servers − client makes direct connection to server to get file 29 30 5
11/10/08 Gnutella vs. DNS Distributed hash tables (DHTs) Can we locate files without an exhaustive search? Gnutella DNS − want to scale to thousands of servers Hierarchical; Nature of the namespace Multi-dimensional flat at each level DHTs (Pastry, Chord, etc.) Scalability Low High − Map servers and objects into an coordinate space Efficiency of resolution Low Moderate − Objects/info stored based on its key Expressiveness of queries High Exact matches − Organize servers into a predefined topology (e.g., a Robustness to failures Moderate Moderate ring or a k-dimensional hypercube) − Route over this topology to find objects Content is not indexed in Gnutella We’ll talk about Pastry (with some slides stolen from Peter Trade-off between exhaustiveness and efficiency Druschel) 31 32 Pastry: Object insertion/lookup Pastry: Id space 2 128 -1 O 2 128 -1 O 128 bit circular id space Msg with key X is nodeIds (uniform random) routed to live node X objId with nodeId objIds (uniform random) closest to X Invariant: node with numerically closest nodeId maintains object Problem: nodeIds complete routing table not feasible Route(X) 33 34 Pastry: Routing table (# 65a1fc x ) Pastry: Routing Row 0 Tradeoff Row 1 O( log N ) routing table size O( log N ) message forwarding steps Row 2 Row 3 35 36 6
Recommend
More recommend