(World Wide) Web Web history • a way to connect computers that provide information (servers) • 1989: Tim Berners-Lee at CERN with computers that ask for it (clients like you and me) – a way to make physics literature and – uses the Internet, but it's not the same as the Internet research results accessible on the Internet • URL (uniform resource locator, e.g., http://www.amazon.com) • 1991: first software distributions – a way to specify what information to find, and where • HTTP (hypertext transfer protocol) • Feb 1993: Mosaic browser – a way to request specific information from a server and get it back – Marc Andreessen at NCSA (Univ of Illinois) • HTML (hyptertext markup language) • Mar 1994: Netscape – a language for describing information for display • browser (Firefox, Safari, Internet Explorer, Opera, Chrome, …) – first commercial browser – a program for making requests, and displaying results • technical evolution managed by World Wide Web Consortium • embellishments – non-profit organization at MIT, Berners-Lee is director – pictures, sounds, movies, ... – official definition of HTML and other web specifications – loadable software – see www.w3.org • the set of everything this provides HTTP: Hypertext transfer protocol some detail on HTTP protocal • What happens when you click on a URL? Request: • client opens TCP/IP connection to host, sends request Request line: method object protocal GET /filename HTTP/1.0 GET url Headers: many options, most optional • server returns server empty line – header info client – HTML message body (optional) HTML • since server returns the text, it can be created as needed Example methods – can contain encoded material of many different types (MIME) GET retrieval • URL format POST submiting data to be processed (in body) service://hostname/filename?other_stuff Mandatory header • filename?other_stuff part can encode HOST URL sending request to – data values from client (forms) – request to run a program on server (cgi-bin) – anything else e.g. http://www.google.com/search?q=mime &ie=utf-8&oe=utf-8&aq=t& rls=org.mozilla:en-US:official&client=firefox-a Example from Wikipedia entry for HTTP: HTTP protocal: continuing some details • Request: Response: protocal status GET /index.html HTTP/1.1 Host: www.example.com Date: Server: software information • Response Last-Modified: HTTP/1.1 200 OK Etag: determine cached version & current identical Date: Mon, 23 May 2005 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Accept-Ranges: Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Content-Length: Etag: "3f80f-1b6-3e1cb03b" Connection: close Accept-Ranges: bytes Content –Type: Internet media type Content-Length: 438 Connection: close text of requested object Content-Type: text/html; charset=UTF-8 (A sample of header fields shown in blue) text of page 1
Embellishments Forms and CGI programs • original design of HTTP just returns text to be displayed • "common gateway interface" • now includes pictures, sound, video, ... – standard way to request the server to run a program – using information provided by the client via a form – need helpers or plug-ins to display non-text content e.g., GIF, JPEG graphics; sound; movies • if the target file on server is an executable program • forms filled in by user – e.g., in /cgi-bin directory – need a program on the server to interpret the information (cgi-bin) • or if it has the right kind of name • HTTP is stateless – e.g., something.cgi • run it on the server to produce HTML to send back to client – server doesn't remember anything from one request to next – need a way to remember information on the client: cookies – using the contents of the form as input – output depends on client request: created on the fly, not just a file • active content: download code to run on the client – Javascript and other interpreters • CGI programs can be written in any programming language – Java applets – often Perl, PHP, Java – plug-ins – ActiveX Example CGI program in Perl (mailform.cgi modified) Web pages: Information passed and actions initiated • HTTP requests identify host and address: #!/usr/local/bin/perl –w – my $urcomp = $query->remote_host(); use CGI; – my $urIP = $query->remote_addr(); my $query = new CGI; print $query->header; • Initate actions with Javascript print $query->start_html(-title=>'Form results'); print "<h1> Form results </h1>\n"; – onmouseover etc my $urcomp = $query->remote_host(); my $urIP = $query->remote_addr(); • Links with “extra” print "<P> Your computer is $urcomp\n"; – Google ads print "<P> Your IP address is $urIP\n"; print "<P>\n"; foreach $name ($query->param) { print "<br> $name:"; foreach $value ($query->param($name)) { print " $value”;} print "\n"; } Cookies Cookie crumbs • get a page from xyz.com • HTTP is stateless: doesn't remember from one request to next – it contains <img src=http://doubleclick.com/advt.gif> • cookies intended to deal with stateless nature of HTTP – this causes a page to be fetched from DoubleClick.com – remember preferences, manage "shopping cart", etc. – which now knows your IP address and what page you were looking at • cookie: one line of text sent by server to be stored on client • DoubleClick sends back a suitable advertisement – stored in browser while it is running (transient) – with a cookie that identifies "you" at DoubleClick – stored in client file system when browser terminates (persistent) • next time you get any page that contains a doubleclick.com image • when client reconnects to same domain, – the last DoubleClick cookie is sent back to DoubleClick browser sends the cookie back to the server – the set of sites and images that you are viewing is used to - update the record of where you have been and what you have looked at – sent back verbatim; nothing added - send back targeted advertising (and a new cookie) – sent back only to the same domain that sent it originally • this does not necessarily identify you personally so far – contains no information that didn't originate with the server • but if you ever provide personal identification, it can be (and will be) attached • in principle, pretty benign • defenses: • but heavily used to monitor browsing habits, for commercial – turn off all cookies; turn off "third-party" cookies purposes – don't reveal information – clean up cookies regularly 2
Recommend
More recommend