Distributed Systems Principles and Paradigms Chapter 12 (version October 15, 2007 ) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel: (020) 598 7784 E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/ ∼ steen/ 01 Introduction 02 Architectures 03 Processes 04 Communication 05 Naming 06 Synchronization 07 Consistency and Replication 08 Fault Tolerance 09 Security 10 Distributed Object-Based Systems 11 Distributed File Systems 12 Distributed Web-Based Systems 13 Distributed Coordination-Based Systems 00 – 1 /
Distributed Web-Based Systems Essence: The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents: 2. Server fetches Client machine Server machine document from local file Browser Web server OS 3. Response 1. Get document request (HTTP) • Documents are generally represented in text (plain text, HTML, XML) • Alternative types: images, audio, video, but also applications (PDF , PS) • Documents may contain scripts that are executed by the client-side software 12 – 1 Distributed Web-Based Systems/12.1 Architecture
Multi-tiered Architectures Observation: Already very soon, Web sites were or- ganized into three tiers: 3. Start process to fetch document 1. Get request 4. Database interaction HTTP� CGI� request� program handler 6. Return result 5. HTML document� created Database server Web server CGI process 12 – 2 Distributed Web-Based Systems/12.1 Architecture
Web Services Observation: At a certain point, people started rec- ognizing that it is was more than just user ↔ site in- teraction: sites could offer services to other sites ⇒ standardization is then badly needed. Client machine Server machine Look up� a service Client� Server� Publish service application application Stub Stub SOAP Communication� Communication� subsystem subsystem Generate stub� Generate stub� from WSDL� from WSDL� description description Service description (WSDL) Service description (WSDL) Service description (WSDL) Directory service (UDDI) 12 – 3 Distributed Web-Based Systems/12.1 Architecture
Clients: Web browsers Observation: browsers form the Web’s most impor- tant client-side sofware. They used to be simple, but that is long ago. Display back end User interface Browser engine Rendering engine Client-side� Network� HTML/XML� script� interpreter comm. parser 12 – 4 Distributed Web-Based Systems/12.2 Processes
Apache Web Server Observation: More than 70% of all Web sites are based on Apache. The server is internally organized more or less according to the steps needed to process an HTTP request: Module Module Module Function ... ... ... Link between� function and hook Hook Hook Hook Hook Apache core Functions called per hook Request Response 12 – 5 Distributed Web-Based Systems/12.2 Processes
Server Clusters (1/2) Essence: To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients: Web Web Web Web server server server server LAN Front end handles Front all incoming requests end and outgoing responses Request Response Problem: The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the con- tent of the HTTP request and then selects the best server. 12 – 6 Distributed Web-Based Systems/12.2 Processes
Server Clusters (2/2) Question: Why can content-aware distribution be so much better? 6. Server responses Web 5. Forward server 3. Hand of other f TCP connection messages Distributor Other messages Dis- Client Switch 4. Inform patcher switch Setup request Distributor 1. Pass setup request 2. Dispatcher selects to a distributor server Web server 12 – 7 Distributed Web-Based Systems/12.2 Processes
Communication (1/2) Essence: Communication in the Web is generally based on HTTP; a relatively simple client-server transfer pro- tocol having the following request messages: Operation Description Head Request to return the header of a document Get Request to return a document to the client Put Request to store a document Post Provide data that are to be added to a docu- ment (collection) Delete Request to delete a document 12 – 8 Distributed Web-Based Systems/12.3 Communication
Communication (2/2) C/S Header Contents Accept C The type of documents the client can handle Accept-Charset C The character sets are acceptable for the client Accept- C The document encodings the client can handle Encoding Accept- C The natural language the client can handle Language Authorization C A list of the client’s credentials WWW- S Security challenge the client should respond to Authenticate Date C+S Date and time the message was sent ETag S The tags associated with the returned document Expires S The time for how long the response remains valid From C The client’s e-mail address Host C The TCP address of the document’s server If-Match C The tags the document should have If-None-Match C The tags the document should not have If-Modified- C Tells the server to return a document only if it has Since been modified since the specified time If-Unmodified- C Tells the server to return a document only if it has Since not been modified since the specified time Last-Modified S The time the returned document was last modified Location S A document reference to which the client should redirect its request Referer C Refers to client’s most recently requested document Upgrade C+S The application protocol sender wants to switch to Warning C+S Information about status of the data in the message 12 – 9 Distributed Web-Based Systems/12.3 Communication
SOAP Simple Object Access Protocol: Based on XML, this is the standard protocol for communication be- tween Web services. • SOAP is bound to an underlying protocol (i.e., it is not independent from its carrier) • Conversational exchange style: Send a docu- ment one way, get a filled-in response back. • RPC-style exchange: Used to invoke a Web ser- vice. 12 – 10 Distributed Web-Based Systems/12.3 Communication
A Note on XML Observation: XML has the advantage of allowing self- describing documents. Full stop (i.e., it introduces performance problems and is not meant to be read by human beings) env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <n:alertcontrol xmlns:n="http://example.org/alertcontrol"> <n:priority>1</n:priority> <n:expires>2001-06-22T14:00:00-05:00</n:expires> </n:alertcontrol> </env:Header> <env:Body> <m:alert xmlns:m="http://example.org/alert"> <m:msg>Pick up Mary at school at 2pm</m:msg> </m:alert> </env:Body> </env:Envelope> 12 – 11 Distributed Web-Based Systems/12.3 Communication
Naming: URL URL: Uniform Resource Locator tells how and where to access a resource. Scheme Host name Pathname http www.cs.vu.nl /home/steen/mbox :// (a) Scheme Host name Port Pathname http :// www.cs.vu.nl : 80 /home/steen/mbox (b) Scheme Host name Port Pathname http :// 130.37.24.11 : 80 /home/steen/mbox (c) Examples: http HTTP http://www.cs.vu.nl:80/globe mailto Mail mailto:steen@cs.vu.nl ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README file Local file file:/edu/book/work/chp/11/11 data Inline data data:text/plain;charset=iso-8859-7, %e1%e2%e3 telnet Remote login telnet://flits.cs.vu.nl tel Telephone tel:+31201234567 modem Modem modem:+31201234567;type=v32 12 – 12 Distributed Web-Based Systems/12.4
Synchronization: WebDAV Problem: There is a growing need for collaborative auditing of Web documents, but bare-bones HTTP can’t help here. Solution: Web Distributed Authoring and Versioning. • Supports exclusive and shared write locks, which operate on entire documents • A lock is passed by means of a lock token; the server registers the client(s) holding the lock • Clients modify the document locally and post it back to the server along with the lock token Note: There is no specific support for crashed clients holding a lock. 12 – 13 Distributed Web-Based Systems/12.5 Synchronization
Web Proxy Caching Basic idea: Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency pro- tocols: • Always verify validity by contacting server • Age-based consistency: T expire = α · ( T cached − T last modi f ied ) + T cached • Cooperative caching, by which you first check your neighbors on a cache miss: Web server 3. Forward request to Web server 1. Look in local cache Web 2. Ask neighboring proxy caches Web proxy proxy Cache Cache Client Client Client Client Client Client Web proxy HTTP Get request Cache Client Client Client 12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication
Recommend
More recommend