CSE 513 I ntroduction to Operating Systems Class 9 - Distributed and Multiprocessor Operating Systems J onat han Walpole Dept . of Comp. Sci. and Eng. Oregon Healt h and Science Universit y 1
Why use parallel or distributed systems? � Speed - reduce time to answer � Scale - increase size of problem � Reliability - increase resilience to errors � Communication - span geographical distance 2
Overview � Multiprocessor systems � Multi- computer systems � Distributed systems 3
Multiprocessor, multi- computer and distributed architectures � shar ed memor y mult ipr ocessor � message passing mult i-comput er (clust er ) � wide ar ea dist r ibut ed syst em
Multiprocessor Systems
Multiprocessor systems � Def inition: � A comput er syst em in which t wo or mor e CPUs shar e f ull access t o a common RAM � Hardware implements shared memory among CPUs � Architecture determines whether access times to dif f erent memory regions are the same � UMA - unif or m memor y access � NUMA - non-unif or m memor y access 6
Bus- based UMA and NUMA architectures Bus becomes t he bot t leneck as number of CPUs increases 7
Crossbar switch- based UMA architecture I nt erconnect cost increases as square of number of CPUs 8
Multiprocessors with 2x2 switches 9
Omega switching network f rom 2x2 switches I nt erconnect suf f ers cont ent ion, but cost s less 10
NUMA multiprocessors Single address space visible to all CPUs � Access to remote memory via commands � LOAD - STORE - Access to remote memory slower than to local � memory Compilers and OS need to be caref ul about � data placement 11
Directory- based NUMA multiprocessors (a) 256- node directory based multiprocessor (b) Fields of 32- bit memory address (c) Directory at node 36 12
Operating systems f or multiprocessors � OS structuring approaches � Pr ivat e OS per CPU � Mast er -slave ar chit ect ur e � Symmet r ic mult ipr ocessing ar chit ect ur e � New problems � mult ipr ocessor synchr onizat ion � mult ipr ocessor scheduling 13
The private OS approach � I mplications of private OS approach � shared I / O devices � st at ic memory allocat ion � no dat a sharing � no parallel applicat ions 14
The master- slave approach OS only runs on master CPU � � Single kernel lock prot ect s OS dat a st ruct ures � Slaves t rap syst em calls and place process on scheduling queue f or mast er Parallel applications supported � � Memory shared among all CP Us Single CPU f or all OS calls becomes a bottleneck � 15
Symmetric multiprocessing (SMP) OS runs on all CPUs � � Mult iple CP Us can be execut ing t he OS simult aneously � Access t o OS dat a st ruct ures requires synchronizat ion � Fine grain crit ical sect ions lead t o more locks and more parallelism … and more pot ent ial f or deadlock 16
Multiprocessor synchronization � Why is it dif f erent compared to single processor synchronization? � Disabling int er r upt s does not pr event memor y accesses since it only af f ect s “t his” CPU � Mult iple copies of t he same dat a exist in caches of dif f er ent CPUs • atomic lock instructions do CPU- CPU communication � Spinning t o wait f or a lock is not always a bad idea 17
Synchronization problems in SMPs TSL instruction is non- trivial on SMPs 18
Avoiding cache thrashing during spinning Multiple locks used to avoid cache thrashing 19
Spinning versus switching � I n some cases CPU “must” wait � scheduling cr it ical sect ion may be held � I n other cases spinning may be more ef f icient than blocking � spinning wast es CPU cycles � swit ching uses up CPU cycles also � if cr it ical sect ions ar e shor t spinning may be bet t er t han blocking � st at ic analysis of cr it ical sect ion dur at ion can det er mine whet her t o spin or block � dynamic analysis can impr ove per f or mance 20
Multiprocessor scheduling � Two dimensional scheduling decision � t ime (which pr ocess t o r un next ) � space (which pr ocessor t o r un it on) � Time sharing approach � single scheduling queue shar ed acr oss all CPUs � Space sharing approach � par t it ion machine int o sub-clust er s 21
Time sharing � Single data structure used f or scheduling � Problem - scheduling f requency inf luences inter- thread communication time 22
I nterplay between scheduling and I PC Problem with communication between two threads � � bot h belong t o process A � bot h running out of phase 23
Space sharing � Groups of cooperating threads can communicate at the same time � f ast int er-t hread communicat ion t ime 24
Gang scheduling � Problem with pure space sharing � Some par t it ions ar e idle while ot her s ar e over loaded � Can we combine time sharing and space sharing and avoid introducing scheduling delay into I PC? � Solution: Gang Scheduling � Gr oups of r elat ed t hr eads scheduled as a unit (gang) � All member s of gang r un simult aneously on dif f erent t imeshar ed CPUs � All gang member s st ar t and end t ime slices t oget her 25
Gang scheduling 26
Multi- computer Systems
Multi- computers � Also known as � clust er comput ers � clust ers of workst at ions (COWs) � Def inition:Tightly- coupled CPUs that do not share memory 28
Multi- computer interconnection topologies (a) single swit ch (d) double t orus (b) r ing (e) cube (c) grid (f ) hypercube 29
Store & f orward packet switching 30
Network interf aces in a multi- computer � Network co- processors may of f - load communication processing f rom the main CPU 31
OS issues f or multi- computers � Message passing perf ormance � Programming model � synchr onous vs asynchor nous message passing � dist r ibut ed vir t ual memor y � Load balancing and coordinated scheduling 32
Optimizing message passing perf ormance � Parallel application perf ormance is dominated by communication costs � int er r upt handling, cont ext swit ching, message copying … � Solution - get the OS out of the loop � map int er f ace boar d t o all pr ocesses t hat need it � act ive messages - give int er r upt handler addr ess of user -buf f er � sacr if ice pr ot ect ion f or per f or mance? 33
CPU / network card coordination � How to maximize independence between CPU and network card while sending/ receiving messages? � Use send & r eceive r ings and bit -maps � one always set s bit s, one always clear s bit s 34
Blocking vs non- blocking send calls (a) Blocking send call Minimum services � provided � send and receive commands These can be blocking � (synchronous) or non- blocking (asynchronous) calls (b) Non-blocking send call 35
Blocking vs non- blocking calls � Advantages of non- blocking calls � abilit y t o over lap comput at ion and communicat ion impr oves per f or mance � Advantages of blocking calls � simpler pr ogr amming model 36
Remote procedure call (RPC) � Goal � suppor t execut ion of r emot e pr ocedur es � make r emot e pr ocedur e execut ion indist inguishable f r om local pr ocedur e execut ion � allow dist r ibut ed pr ogr amming wit hout changing t he pr ogr amming model 37
Remote procedure call (RPC) � Steps in making a remote procedure call � client and ser ver st ubs ar e pr oxies 38
RPC implementation issues � Cannot pass pointers � call by r ef er ence becomes copy-r est or e (at best ) � Weakly typed languages � Client st ub cannot det er mine size of r ef er ence par amet er s � Not always possible t o det er mine par amet er t ypes � Cannot use global variables � may get moved (r eplicat ed) t o r emot e machine � Basic problem - local procedure call relies on shared memory 39
Distributed shared memory (DSM) � Goal � use sof t war e t o cr eat e t he illusion of shar ed memor y on t op of message passing har dwar e � lever age vir t ual memor y har dwar e t o page f ault on non-r esident pages � ser vice page f ault s f r om r emot e memor ies inst ead of f r om local disk 40
Distributed shared memory (DSM) � DSM at the hardware, OS or middleware layer 41
Page replication in DSM systems Replication (a) Pages distributed on 4 machines (b) CPU 0 reads page 10 (c) CPU 1 reads page 10 42
Consistency and f alse sharing in DSM 43
Strong memory consistency W3 W1 P 1 R2 W2 P 2 R1 W4 P 3 P 4 � Total order enf orces sequential consistency int uit ively simple f or programmers, but very cost ly t o � implement not even implement ed in non-dist ribut ed machines! � 44
Scheduling in multi- computer systems � Each computer has its own OS � local scheduling applies � Which computer should we allocate a task to initially? � Decision can be based on load (load balancing) � load balancing can be st at ic or dynamic 45
Graph- theoretic load balancing approach Process Two ways of allocating 9 processes to 3 nodes � Total network traf f ic is sum of arcs cut by node � boundaries The second partitioning is better � 46
Recommend
More recommend