I nternet , intranet and W eb L ecture II W orld W ide W eb : standards , protocols , documents Marco Solieri marco.solieri@lipn.univ-paris13.fr Info et Réseaux en Apprentissage, Sup Galilée, Paris 13 November 6 th, 2014
O utline 1 W orld W ide W eb 2 U niform R esource I dentifier 3 H yper T ext T ransfer P rotocol 4 HTTP extensions 5 W eb architectures 6 W eb markup 7 XHTML 1 and HTML 4 markup M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 2 / 61
W orld W ide W eb Section 1 W orld W ide W eb M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 3 / 61
W orld W ide W eb Section 1 W orld W ide W eb M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 3 / 61
W orld W ide W eb H ypermedia Ted Nelson, 1960 s: • removal of predeterminedness of text’s sequence: hypertext • (never-ended) implementation of needed technologies: Xanadu project D efinition (H ypertext ) A text with accessible references (hyperlinks) to other text. Timothy Barners-Lee, 1980 s: • simplification of hypertext concept • project of needed technologies: WorldWideWeb D efinition (W orld W ide W eb ) A system of hypermedia accessed via the Internet and realized with a client-server architecture. M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 3 / 61
W orld W ide W eb W eb client - server architecture Client (browser) • user interface for read-only access to hypermedia: • visualize text and image, • play sound and video; • retrieval of documents from the server; • extendible via software components: • plugin: locally stored • scripts: remotely downloaded (JavaScript programming language); Server • transfer of hypermedia to client • access to hypermedia, either: • local (e.g. file) or remote (e.g. record in DB) • static or dynamic (generated by software) M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 4 / 61
W orld W ide W eb B asics of W eb Resource identifier (URI) • identification for hypermedia and anything else, • target for hyperlinks. Communication protocol (HTTP) • client-server stateless communication • access to resources on the Internet Document language (XHTML) • realize hypertext and hypermedia • hyperlinks support M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 5 / 61
U niform R esource I dentifier Section 2 U niform R esource I dentifier M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 6 / 61
U niform R esource I dentifier Section 2 U niform R esource I dentifier M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 6 / 61
U niform R esource I dentifier F eatures R esources D efinition (R esource ) An object available on the World Wide Web. What a resource could be: • a file stored in a filesystem (e.g. a JPEG photo) • a record of data (e.g from a DB) • a file output of an application (e.g. a PDF document) • a concrete object (e.g. a person or a book) • an abstract concept (e.g. a grammar of a language) • . . . M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 6 / 61
U niform R esource I dentifier F eatures U niformity ( or universality ) Common syntax for resources that could be located: • Web, accessible via HTTP • Internet, accessible via the appropriate protocol (e.g. FTP) Simple: • protocol independence, • self-contained (include any information needed) • cost-effectiveness in storage and communication (string format), M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 7 / 61
U niform R esource I dentifier D efinition T ypes N ame (URN) unique, permanent and non-repudiable tag L ocator (URL) information for effective access M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 8 / 61
U niform R esource I dentifier D efinition S yntax D efinition (URI syntax ) schema : [// authority] path [? query] [# fragment] Where S chema arbitrary string, protocol name in case of URL (IANA) authority hierarchical name of responsible a subspace of names, where form is [userinfo @] host [: port] where host is a domain name or an IP address P ath hierarchical name of resource, where separator is / Q uery specifications of resource, (typically) where separator is &, form is parameter=value and space is + F ragment secondary resource: internal of or relative to the primary M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 9 / 61
U niform R esource I dentifier D efinition E xamples E xample (URI s ) • http://www.ietf.org/rfc/rfc2396.txt • ftp://ftp.is.co.za/rfc/rfc1808.txt • cid:foo4%25foo1@bar.net • mailto:John.Doe@example.com • news:comp.infosystems.www.servers.unix • file:///home/john/Documents/file.tex • urn:isbn:0-486-27557-4 M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 10 / 61
U niform R esource I dentifier D efinition S pecial characters Reserved characters: ; / : @ & = + $ , Escaped characters (with %NN): • control characters, i.e. ASCII < 32 • non ASCII, i.e. Latin- 1 > 127 • unwise characters: { } | \ ^ [ ] ‘ • delimiters: < > # % " • reserved characters used with different meaning M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 11 / 61
U niform R esource I dentifier D ynamics O perations on URI s Resolution • generation of the corresponding absolute URL • input: • an URI reference (i.e. a relative URI) • an URI which is not an URL output: an URL Dereferencing • retrieval of the corresponding resource • input: an URL • output: a resource M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 12 / 61
H yper T ext T ransfer P rotocol Section 3 H yper T ext T ransfer P rotocol M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 13 / 61
H yper T ext T ransfer P rotocol C onnections HTTP features C lient - server arch . the client opens the connection and request a service, the server replies and closes the connection. D ata independence support for transfer of HTML document and any other format, via content negotiation. S tatelessness any HTTP connection must contain any information needed for the response. C aching support for implementation of various caching policies and tools. A uthentication specifications for various techniques of user authentication. M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 13 / 61
H yper T ext T ransfer P rotocol C onnections R oles of a HTTP communication Necessary roles: U ser agent The client which initiates the HTTP request (i.e. a browser or a bot) O rigin server The server who owns the resource Extra roles (possible): P roxy an application acting both as server and a client and controlling the communication • transparently (e.g. caching), or • not transparently (e.g. verification, filtering, enriching). G ateway an application acting as the origin server (e.g. load balancing or load layering) M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 14 / 61
H yper T ext T ransfer P rotocol C onnections C onnection and persistence D efinition (HTTP 1 . 0 connection ) A request by client and a reply by server. Cons of having a distinct TCP connection for each HTTP request: • network overhead, • computation overhead, • time overhead. HTTP 1 . 1 (IETF RFCs 2616 , 2617 ) introduces connection persistence D efinition (HTTP 1 . 1 connection ) A sequence alternating a request and a reply. D efinition (HTTP 1 . 1 pipelining connection ) A sequence alternating requests and ordered replies. M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 15 / 61
H yper T ext T ransfer P rotocol R equests R equest D efinition (HTTP R equest ) A MIME message with the following syntax: Method URI Version CRLF [ Header CRLF ] ∗ CRLF [ Body ] Where M ethod the type of action requested V ersion one of HTTP/1.0 and HTTP/1.1 H eader parameters of transmission, entity and request M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 16 / 61
H yper T ext T ransfer P rotocol R equests R equest methods Main methods: GET Retrieve a representation of a resource (since HTTP 0 . 9 ). Could be: • conditional, when specifying a criterion (e.g. If-match, If-modified-since); • partial, when specifying a portion of a request. HEAD Retrieve server’s information about a resource. Ask for a reply message without body: headers only. POST Relate an information to a resource Used for data submission (e.g. from a form). PUT Insert a resource. Create a new resource or substitute the old one. DELETE Remove a resource and any related information. Note: PUT and DELETE offer no access control (see WebDAV). M. S olieri (AIR 2 – S up G alilée – P aris 13 ) IWEB: W orld W ide W eb 11 / 6 / 2014 17 / 61
Recommend
More recommend