2/19/2016 The Rise of Distributed Systems • Computer hardware prices falling, power increasing – If cars did same, Rolls Royce would cost 1 dollar and get 1 billion miles per gallon (with 200 page manual to open door) Distributed Computing Systems • Network connectivity increasing – Everyone is connected with “fat” pipes, even when moving • It is easy to connect hardware together Overview of Distributed Systems – Layered abstractions have worked very well • Definition: a distributed system is “ A collection of independent computers that appears to its users as a single coherent system ” Andrew Tanenbaum and Marten van Steen, Distributed Systems – Principles and Paradigms , Prentice Hall, c2002. Why Distributed Systems? Why Distributed Systems? - Big data - large pools of data C. Individual computers have limited resources compared to scale of current A. Big data continues to grow: captured, communicated, problems & application domains: aggregated, stored, and analyzed � In mid-2010, informationuniverse 1.2 zettabytes - Google processes 20 � 2020 predictions 44x more at 35 zettabytes 1. Caches and Memory: petabytes of data per day - E.g., data-intensive app: L1 16KB- 64KB, 2-4 cycles B. Applications are becoming data-intensive . astronomical data parsing Cache 512KB- 8MB, 6-15 cycles L2 Cache L3 Cache 4MB- 32MB, 30-50 cycles Main Memory 2GB- 16GB, 300+ cycles Hard Drive 1-5 TB, 3 billion+ cycles Ying Lu, UNL, CSCE990 Advanced Distributed Systems Seminar http://cse.unl.edu/~ylu/csce990/notes/Introduction.ppt Why Distributed Systems? Why Distributed Systems? 2. Processor: 3. Processor (continued): � Number of transistors integrated on single die has continued to grow � CPU speed grows at rate of 55% annually, but mem speed grew only 7% at Moore’s pace � Chip Multiprocessors ( CMPs ) are now available P P P P P L1 L1 L1 L1 L1 P P P P Interconnect P L2 L1 L1 L1 L1 L2 Cache L1 Interconnect Memory L2 L2 Cache Memory A single Processor Chip P A CMP Processor-Memory speed gap M 1
2/19/2016 Why Distributed Systems? Why Distributed Systems? � Even if 100s or 1000s of cores are placed on CMP, challenge to deliver Distributed systems to the rescue! stored data to cores fast enough for processing P P 100 Splits Only 3 P P P P Machines L1 L1 10000 minutes to L1 L1 L1 L1 seconds (or load data L2 L2 3 hours) to Interconnect load data Memory Memory L2 Cache A Data Set (data) A Data Set of 4 TBs of 4 TBs Memory 4 100MB/S IO Channels Depiction of a Distributed System But this brings new requirements � A way to express problem as parallel processes and execute them on different machines (Programming Models and Concurrency). Examples: � A way for processes on different machines to exchange information - The Web (Communication). - Processor pool - Shared memory pool � A way for processes to cooperate with one another and agree on - Airline reservation shared values (Synchronization). - Network game � A way to enhance reliability and improve performance (Consistency - The Cloud and Replication). � A way to recover from partial failures (Fault Tolerance). � A way to protect communication and ensure that process gets only • Distributed system organized as middleware. Note middleware layer extends over multiple machines. those access rights it is entitled to (Security). • Users can interact with system in consistent way, regardless of where interaction � A way to extend interfaces so as to mimic behavior of another takes place (e.g., RPC, memcached , … system, reduce diversity of platforms, and provide high degree of • Note: Middleware may be “part” of application in practice portability and flexibility (Virtualization) Goal - Transparency Outline Transparency Description • Overview (done) Access Hide differences in data representation and how a resource is accessed • Goals (next) Location Hide where a resource is located Migration Hide that a resource may move to another location • Software Relocation Hide that a resource may be moved to another location while in use • Client Server Replication Hide that a resource may be copied Concurrency Hide that a resource may be shared by several competitive users • The Cloud Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk (Different forms of transparency in distributed system) 2
2/19/2016 Scaling Technique: Hiding Goal - Scalability Communication Latency • As systems grow, centralized solutions are limited • Especially important for interactive applications – Consider LAN name resolution (ARP) vs. WAN • If possible, do asynchronous communication – continue working so Concept Example user does not notice delay - Not always possible when client has nothing to do Centralized services A single server for all users • Instead, can hide latencies Centralized data A single on-line telephone book Doing routing based on complete Centralized algorithms information • Ideally, collect information in distributed fashion and distribute in distributed fashion • But sometimes, hard to avoid (e.g., consider money in bank) • Challenges: geography, ownership domains, time synchronization • Scaling techniques? � Hiding latency, distribution, replication (next) Scaling Technique: Distribution Scaling Technique: Replication • Spread information/processing to more than one location • Copy of information to increase availability 1. Root DNS Servers and decrease centralized load 2. – Example: File caching is replication decision made org DNS servers edu DNS servers com DNS servers by client 3. ? – Example: CDNs (e.g., Akamai ) for Web poly.edu umass.edu pbs.org yahoo.com amazon.com DNS servers DNS servers DNS servers – Example: P2P networks (e.g., BitTorrent ) distribute DNS servers DNS servers copies uniformly or in proportion to use • Issue: Consistency of replicated information Client wants IP for www.amazon.com ( approximation ): 1. Client queries root server to find . com DNS server – Example: Web browser cache or NFS cache – how 2. Client queries .com DNS server to get amazon.com DNS server to tell it is out of date? 3. Client queries amazon.com DNS server to get IP address for www.amazon.com Software Concepts Outline System Description Main Goal • Overview (done) Tightly-coupled operating system for multi- Hide and manage DOS processors and homogeneous multicomputers hardware resources • Goals (done) Loosely-coupled operating system for Offer local services NOS heterogeneous multicomputers (LAN and to remote clients • Software (next) WAN) Additional layer atop of NOS implementing Provide distribution • Client Server Middleware general-purpose services transparency • The Cloud • DOS (Distributed Operating Systems) • NOS (Network Operating Systems) • Middleware (Next slides) 3
2/19/2016 Distributed Operating Systems Network Operating System (1 of 3) • Typically, all hosts are homogenous • But no longer have shared memory • OSes can be different (Windows or Linux) – Can try to provide distributed shared memory • Typical services: rlogin , rcp • But tough to get acceptable performance, especially for large requests � Provide message passing – Fairly primitive way to share files Network Operating System (3 of 3) Network Operating System (2 of 3) • Can have one computer provide files transparently for others (NFS) • Different clients may mount the servers in different places • Inconsistencies in view make NOSes harder for users than DOSes – But easier to scale by adding computers Positioning Middleware Outline • Network OS not transparent. Distributed OS not independent of computers. – Middleware can help • Overview (done) • Goals (done) • Software (done) • Client Server (next) • The Cloud • Often middleware built in-house to help use networked operating systems (distributed transactions, better communication, RPC) ― Unfortunately, many different standards 4
2/19/2016 Client-Server Implementation Levels Clients and Servers • Thus far, have not talked about organization of processes – Again, many choices but most widely used is client-server • If can do so without connection (local), quite simple ― If underlying connection is unreliable, not trivial ― Resend. What if receive twice? • Example of Internet search engine • Use TCP for reliable connection (most Internet apps) – UI on client – Data level is server, keeps consistency ― Not always needed for high-speed LAN connection – Processing can be on client or server ― Not always appropriate for interactive applications (e.g., games) Multitiered Architectures Multitiered Architectures: 3 tiers • Server(s) may act as client(s), sometimes • Thin client (a) to Fat client (e) – Example: transaction monitor across multiple databases (a) is simple echo terminal, (b) has GUI at client • Also known as vertical distribution (c) has user side processing (e.g., check Web form for consistency) (d) and (e) popular for NOS environments (e.g., server has files only) Alternate Architectures: Horizontal Outline • Overview (done) • Goals (done) • Software (done) • Client Server (done) • The Cloud (next) Ying Lu, UNL, CSCE990 Advanced Distributed Systems Seminar http://cse.unl.edu/~ylu/csce990/notes/Introduction.ppt • Rather than vertical, distribute servers across nodes – Example: Web server “farm” for load balancing – Clients, too (peer-to-peer systems) – Most effective for read-heavy systems (cache consistency) 5
Recommend
More recommend