advanced architectures goals of distributed computing
play

Advanced Architectures Goals of Distributed Computing better - PDF document

6/7/2018 Advanced Architectures Goals of Distributed Computing better services 15A. Distributed Computing scalability 15B. Multi-Processor Systems apps too big to run on a single computer 15C. Tightly Coupled Distributed Systems


  1. 6/7/2018 Advanced Architectures Goals of Distributed Computing • better services 15A. Distributed Computing – scalability 15B. Multi-Processor Systems • apps too big to run on a single computer 15C. Tightly Coupled Distributed Systems • grow system capacity to meet growing demand 15D. Loosely Coupled Distributed Systems – improved reliability and availability 15E. Cloud Models – improved ease of use, reduced CapEx/OpEx 15F. Distributed Middle-Ware • new services – applications that span multiple system boundaries – global resource domains, services (vs. systems) – complete location transparency Advanced Architectures 1 Advanced Architectures 2 Major Classes of Distributed Systems Evaluating Distributed Systems • Performance • Symmetric Multi-Processors (SMP) – overhead, scalability, availability – multiple CPUs, sharing memory and I/O devices • Functionality • Single-System Image (SSI) & Cluster Computing – adequacy and abstraction for target applications – a group of computers, acting like a single computer • Transparency • loosely coupled, horizontally scalable systems – compatibility with previous platforms – coordinated, but relatively independent systems – scope and degree of location independence • application level distributed computing • Degree of Coupling – peer-to-peer, application level protocols – on how many things do distinct systems agree – distributed middle-ware platforms – how is that agreement achieved Advanced Architectures 3 Advanced Architectures 4 SMP systems and goals Symmetric Multi-Processors • Characterization: – multiple CPUs sharing memory and devices CPU 1 CPU 2 CPU 3 CPU 4 interrupt • Motivations: controller cache cache cache cache – price performance (lower price per MIP) shared memory & device busses – scalability (economical way to build huge systems) – perfect application transparency device device device controller controller controller memory • Example: – multi-core Intel CPUs – multi-socket mother boards Advanced Architectures 6 Advanced Architectures 5 1

  2. 6/7/2018 SMP Price/Performance SMP Operating System Design • a computer is much more than a CPU • one processor boots with power on – mother-board, disks, controllers, power supplies, case – it controls the starting of all other processors – CPU might cost 10-15% of the cost of the computer • same OS code runs in all processors • adding CPUs to a computer is very cost-effective – one physical copy in memory, shared by all CPUs – a second CPU yields cost of 1.1x, performance 1.9x • Each CPU has its own registers, cache, MMU – a third CPU yields cost of 1.2x, performance 2.7x – they must cooperatively share memory and devices • same argument also applies at the chip level • ALL kernel operations must be Multi-Thread-Safe – making a machine twice as fast is ever more difficult – adding more cores to the chip gets ever easier – protected by appropriate locks/semaphores • massive multi-processors are obvious direction – very fine grained locking to avoid contention Advanced Architectures 7 Advanced Architectures 8 SMP Parallelism The Challenge of SMP Performance • scheduling and load sharing • scalability depends on memory contention – each CPU can be running a different process – memory bandwidth is limited, can't handle all CPUs – just take the next ready process off the run-queue – most references satisfied from per-core cache – processes run in parallel – if too many requests go to memory, CPUs slow down – most processes don't interact (other than in kernel) • scalability depends on lock contention • serialization – waiting for spin-locks wastes time – mutual exclusion achieved by locks in shared memory – context switches waiting for kernel locks waste time – locks can be maintained with atomic instructions • contention wastes cycles, reduces throughput – spin locks acceptable for VERY short critical sections – 2 CPUs might deliver only 1.9x performance – if a process blocks, that CPU finds next ready process – 3 CPUs might deliver only 2.7x performance Advanced Architectures 9 Advanced Architectures 10 Managing Memory Contention Non-Uniform Memory Architecture Symmetric Multi-Processors • Fast n-way memory is very expensive – without it, memory contention taxes performance CPU n CPU n+1 – cost/complexity limits how many CPUs we can add local local cache cache memory memory • Non-Uniform Memory Architectures (NUMA) PCI bridge PCI bridge – each CPU has its own memory • each CPU has fast path to its own memory PCI bus PCI bus – connected by a Scalable Coherent Interconnect CC NUMA device device CC NUMA device device • a very fast, very local network between memories interface controller controller interface controller controller • accessing memory over the SCI may be 3-20x slower Scalable Coherent Interconnect (e.g. Intel Quick Path Interconnect) – these interconnects can be highly scalable Advanced Architectures 12 Advanced Architectures 11 2

  3. 6/7/2018 OS design for NUMA systems Single System Image (SSI) Clusters • Characterization: • it is all about local memory hit rates – a group of seemingly independent computers – every outside reference costs us 3-20x performance collaborating to provide SMP-like transparency – we need 75-95% hit rate just to break even • Motivation: • How can the OS ensure high hit-rates? – higher reliability, availability than SMP/NUMA – replicate shared code pages in each CPU's memory – more scalable than SMP/NUMA – assign processes to CPUs, allocate all memory there – excellent application transparency – migrate processes to achieve load balancing • Examples: – spread kernel resources among all the CPUs – Locus, MicroSoft Wolf-Pack, OpenSSI – attempt to preferentially allocate local resources – Oracle Parallel Server – migrate resource ownership to CPU that is using it Advanced Architectures 13 Advanced Architectures 14 OS design for SSI clustering Modern Clustered Architecture • all nodes agree on the state of all OS resources geographic fail over – file systems, processes, devices, locks IPC ports switch switch – any process can operate on any object, transparently ethernet request replication • they achieve this by exchanging messages SMP SMP SMP SMP system #1 system #2 system #3 system #4 – advising one-another of all changes to resources • each OS's internal state mirrors the global state – request execution of node-specific requests optional dual ported synchronous FC replication dual ported primary site RAID RAID back-up site • node-specific requests are forwarded to owning node • implementation is large, complex, difficult Active systems service independent requests in parallel. They cooperate to maintain shared global locks, and are prepared to take over partner’s work in case of failure. State replication to a back-up site is handled by external mechanisms. • the exchange of messages can be very expensive Advanced Architectures 15 Advanced Architectures 16 SSI Clustered Performance Lessons Learned • consensus protocols are expensive • clever implementation can minimize overhead – they converge slowly and scale poorly – 10-20% overall is not uncommon, can be much worse • systems have a great many resources • complete transparency – resource change notifications are expensive – even very complex applications "just work" • location transparency encouraged non-locality – they do not have to be made "network aware" – remote resource use is much more expensive • good robustness • a greatly complicated operating system – when one node fails, others notice and take-over – distributed objects are more complex to manage – often, applications won't even notice the failure – complex optimizations to reduce the added overheads • nice for application developers and customers – new modes of failure w/complex recovery procedures – but they are complex, and not particularly scalable • Bottom Line: Deutsch was right! Advanced Architectures 17 Advanced Architectures 18 3

Recommend


More recommend