1945 vannevar bush the internet end end
play

1945: Vannevar Bush The Internet End-End As we may think, Atlantic - PowerPoint PPT Presentation

10/12/2019 1945: Vannevar Bush The Internet End-End As we may think, Atlantic The Web Monthly, July, 1945. Describes the idea of a 15-441 Fall 2019 distributed hypertext system Profs Peter Steenkiste & Justine Sherry A


  1. 10/12/2019 1945: Vannevar Bush The Internet End-End “As we may think”, Atlantic • The Web Monthly, July, 1945. • Describes the idea of a 15-441 Fall 2019 distributed hypertext system Profs Peter Steenkiste & Justine Sherry • A “memex” that mimics the “web of trails” in our minds Thanks to Scott Shenker, Sylvia Ratnasamay, Peter Steenkiste, and Srini Seshan for slides. Many other iterations before we got to the Dec 9, 1968: “The Mother of All Demos” World Wide Web First demonstration of Memex- • MINITEL in France. https://en.wikipedia.org/wiki/Minitel inspired system • Project Xanadu. https://en.wikipedia.org/wiki/Project_Xanadu Working prototype with hypertext, linking, use of a mouse… • (Note that you don’t need to know any of this history for exams, this is just for the curious…) https://www.youtube.com/watch?v=74c8LntW7fo 1

  2. 10/12/2019 1989: Tim Berners-Lee Lots of Traffic! 1989: Tim Berners-Lee (CERN) writes internal proposal to develop a exabyte petabyte distributed hypertext system • Connects “a web of notes with links”. • Intended to help CERN physicists in large projects share and manage information 1990: TBL writes graphical browser for Next machines 1992-1994: NCSA/Mosaic/Netscape browser release What is an Exabyte? Hyper Text Transfer Protocol (HTTP) Network 1,000,000,000,000,000,000 Bytes 10 x 10 x 2 x 2 x ● Client-server architecture Storage 1,099,511,627,776 MByte ● Server is “always on” and “well known” Kilo Kilo 3 10 10 ● Clients initiate contact to server Mega Mega 6 20 20 Giga Giga 9 30 30 ● Synchronous request/reply protocol 12 12 40 40 ● Runs over TCP, Port 80 Tera Tera Peta Peta 15 15 50 50 A few years ago ● Stateless Exa Exa 18 18 60 60 Today ● ASCII format Zetta Zetta 21 21 70 70 In a few years Yotta Yotta 24 24 80 80 2

  3. 10/12/2019 Steps in HTTP 1.0 Request/Response Client-to-Server Communication Client Server ● HTTP Request Message ● Request line: method, resource, and protocol version Establish ● Request headers: provide information or modify request connection ● Body: optional data ( e.g., to “POST” data to the server) Client request Request request line GET /somedir/page.html HTTP/1.1 response Host: www.someschool.edu header User-agent: Mozilla/4.0 lines Connection: close Accept-language: fr Close connection (blank line) carriage return line feed indicates end of message Server-to-Client Communication HTTP is Stateless HTTP Response Message ● ● Each request-response treated independently Status line: protocol version, status code, status phrase ● ● Servers not required to retain state Response headers: provide information ● Body: optional data ● ● Good : Improves scalability on the server-side ● Failure handling is easier status line HTTP/1.1 200 OK (protocol, status code, ● Can handle higher rate of requests Connection close status phrase) ● Order of requests doesn’t matter Date: Thu, 06 Aug 2006 12:00:15 GMT Server: Apache/1.3.0 (Unix) header lines Last-Modified: Mon, 22 Jun 2006 ... ● Bad : Some applications need persistent state Content-Length: 6821 ● Need to uniquely identify user or store temporary info Content-Type: text/html ● e.g., Shopping cart, user profiles, usage tracking, … (blank line) data data data data data data ... e.g., requested HTML file 13 3

  4. 10/12/2019 How to Maintain State in a Stateless Protocol: Cookies ● Client-side state maintenance Client stores small amount of state on behalf of server ● Client sends state in future requests to the server ● ● Can provide authentication Performance Issues Request Response Set-Cookie: XYZ Request Cookie: XYZ Performance Goals Solutions? Improve HTTP to compensate for ● User TCP’s weak spots ● User ● Fast downloads (not identical to low-latency commn.!) ● fast downloads (not identical to low-latency commn.!) ● High availability ● high availability ● Content provider ● Content provider ● Happy users (hence, above) ● happy users (hence, above) ● Cost-effective infrastructure ● cost-effective delivery infrastructure ● Network (secondary) ● Network (secondary) ● Minimize overload ● avoid overload 4

  5. 10/12/2019 Solutions? Solutions? Improve HTTP to Improve HTTP to compensate for compensate for TCP’s weak spots TCP’s weak spots ● User ● User ● fast downloads (not identical to low-latency commn.!) ● fast downloads (not identical to low-latency commn.!) ● high availability ● high availability ● Content provider Caching and Replication ● Content provider Caching and Replication ● happy users (hence, above) ● happy users (hence, above) ● cost-effective delivery infrastructure ● cost-effective delivery infrastructure ● Network (secondary) ● Network (secondary) Exploit economies of scale ● avoid overload ● avoid overload (Webhosting, CDNs, datacenters) HTTP Performance Typical Workload (Web Pages) • Multiple (typically small) objects per page ● Most Web pages have multiple objects • File sizes ● e.g., HTML file and a bunch of embedded images • Lots of small objects versus TCP • 3-way handshake • Heavy-tailed • Lots of slow starts ● How do you retrieve those objects (naively)? • Extra connection state • Pareto distribution for tail ● One item at a time, i.e., one “GET” per TCP connection • Lognormal for body of distribution ● Really limits the state on the server • Embedded references ● Solution used in HTTP 0.9, and 1 • Number of embedded objects also Pareto ● New TCP connection per (small) object! Pr(X>x) = (x/xm)-k ● Lots of handshakes • This plays havoc with performance. Why? ● Congestion control state lost across connections • Solutions? 5

  6. 10/12/2019 Optimizing HTTP for Real Web Pages: Pipelined Requests & Responses Persistent Connections ● Maintain TCP connection across multiple requests Client Server Including transfers subsequent to current page ● Batch requests and responses to ● Client or server can tear down connection ● reduce the number of packets ● Performance advantages: Avoid overhead of connection set-up and tear-down ● ● Multiple requests can be contained Allow TCP to learn more accurate RTT estimate ● in one TCP segment Allow TCP congestion window to increase ● i.e., leverage previously discovered bandwidth ● ● Head of line blocking issues Drawback? Head of line blocking ● remains: a delay in Transfer 2 A “slow object” blocks retrieval of all later requests, including “fast” objects ● delays all later transfers ● Default in HTTP/1.1 Concurrent Requests & Responses Scorecard: Getting n Small Objects Over Parallel TCP Sessions ● Use multiple connections in parallel Time dominated by latency ● Speeds up retrieval by ~m R2 ● Does not necessarily maintain order R3 R1 ● One-at-a-time: ~2n RTT of responses T2 T3 ● Partially deals with HOL blocking ● M concurrent: ~2[n/m] RTT T1 ● Persistent: ~ (n+1)RTT • Client = ● Pipelined: ~2 RTT • Content provider = ● Pipelined/Persistent: ~2 RTT first time, RTT later • Network = Why? 6

  7. 10/12/2019 Scorecard: Getting n Large Objects Classic Solution: Caching ● Why does caching help performance? ● Exploits locality of reference Time dominated by bandwidth ● Reduces average response time and load on the network ● How well does caching work? ● Very well, up to a limit ● One-at-a-time: ~ nF/B ● Large overlap in content ● But many unique requests ● M concurrent: ~ [n/m] F/B ● assuming shared with large population of users ● Trend: increase in dynamic content ● and each TCP connection gets the same bandwidth ● E.g., customizing of web pages ● Reduces benefits of caching ● Pipelined and/or persistent: ~ nF/B ● Some exceptions, e.g., video ● The only thing that helps is getting more bandwidth.. Caching: Where? Caching: Clients ● Baseline: Many clients transfer the same information ● Clients keep a local cache of recently accessed objects ● Generate unnecessary server and network load ● Clients experience unnecessary latency ● Clients often have a small number of web Server Server pages they access frequently ● An ideal cache is: ● Leads to reuse of logos, old content, java ● Shared by many clients scripts, … ● Very close to the client Tier-1 ISP Tier-1 ISP ● Cheap: no additional ● Everywhere! ISP-1 ISP-2 infrastructure needed ISP-1 ISP-2 ● Client Clients ● Forward proxies Clients ● But caching closer to server can lead to ● Reverse proxies higher hit rates! ● Content Distribution Network 7

Recommend


More recommend