CMPT-401: Course Information � Instructor: Qiaosheng Shi � Lectures: Tues. 17:30-20:20 � Office hours: � Course website � http://www.cs.sfu.ca/CC/401/qshi1/ � TA: � Textbook: G. Coulouris, J. Dollimore, T. Kindberg, “Distributed Systems Concepts and Design”, 4 th edition. � Grading Policy � 4 assignments 20% • 15-20 minutes presentation for 3 rd assignment (0~3 bonus points) � Project 35% (0~5 bonus points) � Midterm 15% � Final 30% 1 1
CMPT-401: Course Information Late Policy Teaching plan Lecture 1-2 Chapters 1-3 Lecture 3-4 Chapters 4-5 Lecture 5-6 Chapter 7 Lecture 7 (June 20) Midterm Lecture 8-9 Chapter 9, 11, 13 Lecture 10-11 (July 11) Homework 3 (15mins presentation) Lecture 12 Chapter 19, … … Lecture 13 (Aug. 1) Review & project presentation Lecture 14 (Aug. 8) Final exam 2 2
Today’s topics � Chapter 1: Characterization of Distributed Systems � Definition of distributed systems � Examples � Resource sharing and the Web � Challenges � Chapter 2: System Models � Architectural models � Fundamental models 3 3
Distributed computing Huge waste of computing resources � Huge computation requirement � 117 Sun SparcStations to draw 114,000 frames (77- minute). One computer would take 43 years of non- stop computing to do this! The film Toy Story Sharing computing resources Web services Sharing information Restaurant 4 4
Definition of a Distributed System � Motivation � sharing resources: hardware, files, databases, data objects, etc. � A distributed system is: � a collection of independent computers that appears to its users as a single coherent system. � a system of networked computers that coordinate their activity only by message passing. � Resources in a distributed system � encapsulated within computers � accessed by communication. � managed by a server program. � local resources and remote resources 5 5
Definition of a Distributed System � Distributed computing: “a science which solves a large problem by giving small parts of the problem to many computers to solve and then combining the solutions for the parts into a solution for the problem.” � Distributed application: “an application in which the processing and the data are divided among two or more machines.” 6 6
Characteristics of distributed systems � Concurrency � No global clock � Distributed transaction � Independent failures � Network failure � Computer failure 7 7
Resource sharing and the web � Main motivation of DS: resource-sharing � hardware: processor, printer, disks � software-defined entities: files, database, … � Functionality: search engines � Service � manage a collection of related resources and present their functionality to users and applications. � Access resources via a well-defined set of operations. � Server: a running program that accepts requests from programs running on other computers to perform a service. � Clients: the requesting processes. 8 8
Client/server model Client Server invocation invocation result result Server Client Key: Process: Computer: � “invoke an operation” � “remote invocation” � Clients are active and servers are passive � World Wide Web (3W), email and networked printers � Caching technique vs. buffering 9 9
The WWW (World Wide Web) � An evolving system for publishing & accessing resources & services across the Internet � Requires browsers, supported by hypertext linking mechanism to related documents. � Open system � Many web browsers (IE, Netscape), many platforms (cell phone, desktop), many types of services. � the types of resource that can be published on it: pdf files, JPEG image, MPEG-1 video, MPEG-2 video, etc � Browser supports new content- representation formats via plug-ins 1 0 10
The WWW (World Wide Web) � Standard technological components: � HTML (HyperText Markup Language): content format and web-page layout specifications � URL (Uniform Resource Locators): resource location identifier (helps browsers to locate sites of resources): � HTTP (HyperText Transfer Protocol): the ways in which client (i.e. broswers) interact with web servers. 1 1 11
HTML (HyperText Markup Language) � publishing language of 3W. � HTML 4.01 [Dec. 1999] � a format based upon SGML . � S tandard G eneralized M arkup L anguage , a system for organizing and tagging elements of a document [ISO8879, 1986]. � one way of defining and interpreting tags according to SGML rules: <p>, </p> � Plain text editor (i.e. NotePad), WYSIWYG editor (i.e. FrontPage) � XML (e X tensible M arkup L anguage) � A simplified version of SGML � More flexible and adaptable than HTML � One topic for short presentation: Comparison of SGML, HTML, XML, focus on XML. 1 2 12
An example: HTML < title >My first HTML document< /title > ... … < h1 >An important heading </h1 > < h2 >A slightly less important heading </h2 > < p >A paragraph </p > This is a really < em >interesting </em > topic! < ul > < li >the first list item </li > < li >the second list item </li > </ul > < ol> < li >the first list item </li > < li >the second list item </li > </ol > 1 3 13
URL (Uniform Resource Locators) � resource location identifier � scheme:scheme-specific-identifier http:// servername[:port][/pathname][?query][#fragment] 1 4 14
HTTP (HyperText Transfer Protocol) � A request-reply protocol � ‘404 Not Found’ � Specify content types in request � One resource per request � Access control � Dynamic pages � Interacting rather than retrieving. � The result of the request depends on user’s input. � CGI (Common Gateway Interface) program. � Downloaded code (mobile code) � runs inside the browser at user’s computer � provides better-quality interaction � Javascript, applet 1 5 15
Challenges (or desired properties) � Openness � Scalability � Transparency � Concurrency � Heterogeneity � Networks, Computer hardware, Operating systems � Programming languages � Implementation � Solution: Middleware � Security � Failure handling 1 6 16
Security � Confidentiality � Protection against disclosure to unauthorized individuals � Integrity � Protection against alteration or corruption � Availability � Protection against interference with the means to access the resources � Encryption techniques, access control techniques. 1 7 17
Failure handling � Harder than local ones since they involve a network, other computers and other processes. � Detecting, masking, tolerating, and recovery � Redundancy: � There are at least two disjoint path between any two routers in the Internet. � Data duplicate � However, it comes up with new challenge: keep replicas of rapidly changing data up-to-date without excessive loss of performance 1 8 18
Architectural models � layered structure of distributed system software � the main architectural models Platform Platform does not provide a view of a single coherent system . Transparency??? 1 9 19
Solution: Middleware � Masking heterogeneity � provide a convenient programming model to application programmers � Offer a collection of services and provide their interfaces to those services. � Application programming interfaces (APIs) � Middleware models: to describe distribution and communication � A simple model � Remote Procedure Calls (RPCs) � Object-oriented middleware products • Java RMI, CORBA • Web services (simple and effective model of distributed documents: HTML) 2 0 20
Limitations of middleware � Some functions require knowledge only the applications have � Providing the function in the system is impossible � “End-to-end argument” � [Saltzer, Reed, Clarke 1984] � Reasoning against low-level function implementation � An incomplete version of the function may sometimes be useful. 2 1 21
End-to-end caretaking � Careful file transfer � To move file from A to B without damage � Specific steps of transaction: • File transfer application on A reads file • Applications asks comm system for transmission • Comm network transmits file from A to B • Comm system on B reads packets and delivers them to file transfer application on B • File transfer application on B writes file 2 2 22
End-to-end caretaking � Threats to the transfer: � Reading incorrect data from the disk � The software might make a mistake in buffering and copying the data � Communication system � Either A or B may crash � How to cope with these threats � Reinforce each of the steps along the way? • using duplicate copies, timeout and retry, redundancy, crash recovery etc. • Uneconomical if all threats are low in probability � Alternative is “end-to-end check and retry” • Additional step to read file back into memory on B, then calculate and send checksum to A; Retry from the beginning if checksums don’t match 2 3 23
Recommend
More recommend