CO538: Concurrency Design and Practice Bonus Lecture: Other Concurrency Models Dr. Fred Barnes S113, Computing Laboratory (ext. 4278) frmb@kent.ac.uk (ack to Matt Groening for Nibbler) CO538 So Far ... We’re all fairly familiar with the occam- π approach to concurrency: isolated processes with synchronous channel communication. ... and though you may not know it, the CSP process algebra (get this for free!) [Hoare, 1985]. ‘ π ’ bit comes from elements of the π -calculus [Milner, 1999]. This is not the only one however! In brief: shared-memory based models with threads-and-locks implementations. transactional-memory based approaches. message-passing approaches (including occam- π /CSP). data-parallel models and their implementations. hybrid approaches taking different elements from the above.
CO538 A Brief History of Parallel Computing Concurrency and parallelism have mostly been used as a mechanism for improving performance . in networks of workstations (e.g. the Pi-Custer , “Grid”, cloud computing). in supercomputers such as Cray or IBM’s BlueGene , and not entirely unrelated, the transputer . in more recent multiprocessor computers (last 20 years or so). and most recently in multicore processors (including things like GPUs and Sony/Toshiba/IBM’s “Cell” processor). future massively multicore (8, 16, 80, 400, ...) multiprocessor platforms. CO538 Technology Processor technology has, for the time being, reached its limits in terms of raw clock speed (e.g. 4 GHz). and we’re at 50-60 nano-meter processes in CPU manufacturing. the maximum clock speed is limited by heat dissipation. power consumption of the CPU is roughly proportional to ( N t ∗ T p ∗ V 2 ) , thus to pack in more transistors ( N t ) to give more raw capability (e.g. more multicores, more cache) we want to make manufacturing small ( T p ) and run with low voltages ( V ). maximum clock speed is also limited by voltage – to low and there isn’t enough ‘juice’ to get the transistors to work reliably. also limits on how fast we can drive external memory busses . Assorted interesting attempts to push clock speeds ever higher (AMD CPU run at 7.12 GHz not long ago; research has seen up to 500 GHz+), generally overclocking and overvolting . a lot of fun efforts towards exteme cooling (with liquid nitrogen and suchlike).
CO538 Into the Multicore Era As far as raw processing capability goes, multicore CPUs are increasingly popular. a divide-and-conquer approach to getting more processing cores (transistors) on a single chip; potential benefits (more cores) outweigh the losses (slower cores). In the past, parallel programming was primarily of interest to scientists (including us!) across a range of disciplines. some problems trivially parallel — e.g. mandelbrot, render-farms, ... others not so much so — e.g. N-body problems, complex systems simulations, ... most ‘regular’ programming was (and still is to some extent) entirely sequential . Cannot rely on faster clock speed as a way of extracting more performance — need to cater for multicore CPUs. leads to the problem of how to parallelise sequential codes (as an incremental change). essentially what led to threads-and-locks . CO538 Philosophy Parallel computing has generally been used to attain performance . now forced to consider it when programming, so we have to deal with it one way or another. A general view held is that concurrency is hard . and as such, should be avoided at all costs wherever possible. On the other hand, the view taken by ourselves (and others!), and presented in CO538 is that concurrency is easy ! both views are correct; depends on how you manage that concurrency ... more specifically, concurrency need not be hard . The view we try to get across is that concurrency can be used as a fundamental design methodology , not just as a mechanism for extracting performance on modern systems (indeed, concurrency should simplify design and implementation!).
CO538 Why Threads? This is what the operating system typically provides as its concurrency abstraction. essentially multiple processes that share the same address space . a fairly flexible abstraction, can be scheduled in a variety of ways. Threads interact by sharing data in the heap and through OS provided mechanisms . semaphores, mutexes, pipes, condition variables, ... VM 4 GB Scheduled by the OS on: uniprocessor machines (trivial scheduling). T2 stack T1 stack multiprocessor machines (simplistic approach). T0 stack multiprocessor machines (gang scheduling). heap Proc2 T2 T2 T0 T1 T0 data Proc1 T1 T1 T2 T0 T1 text Proc0 T0 T0 T0 T1 T0 T2 T1 T2 T2 T1 T2 0 GB CO538 Thread Hazards Uncontrolled access by threads to data in the heap is likely to result in race hazards . more so in languages that permit aliasing of references (pointers). Bits of code that modify shared state (data) must do so carefully: by ensuring the mutual exclusion of other threads. by using lock-free and/or wait-free algorithms. by using OS or language mechanisms that incorporate the above. Worth noting that such locking is only necessary when multiple threads are running concurrently or can preempt each other in unpredictable ways. unfortunately OSs provide little in the way of support for control over thread scheduling ( POSIX threads are what you tend to get, gang scheduling rare). OSs have always had to deal with concurrency arising from interrupt handling (more in CO527 next term!).
CO538 Traditional Locking Methods Semaphores : essentially a non-negative integer value with a wait set; two operations: wait : if the semaphore value is zero, process added to the wait set , else decrement the value and continue. notify : if the wait set is non-empty, wake up one process, else increment the value. The ‘notify’ operation never blocks ; the ‘wait’ operation can block, however. Mutex : a mechanism for mutual exclusion, can be implemented as a semaphore initialised to 1. lock : wait on semaphore; unlock notify semaphore. Spin-lock : a ‘fast’ mutex that does not involve the OS (required to put a thread to sleep or wake one up). attempts to lock on top of an already held lock spin (100% CPU core) until the other thread unlocks . CO538 Monitors in Java Every object in Java has a monitor associated with it. natural use is to “wait for a change in state of an object until notified”. often used with condition variables . Monitors contain a mutually exclusive lock, which must be held before any action can be performed on a monitor. the operations are wait , notify and notify-all . Monitors require some degree of concurrency in order to work correctly: when a thread calls ‘wait()’, it will be put to sleep. only another thread calling ‘notify()’ or ‘notifyAll()’ can wake it up.
CO538 Java Monitor Operations no lock (active) no lock (inactive) got mutex lock synchronized (x) { threads let in one at a time ... x.wait () ... wait-set holds suspended threads notify wakes T1 one thread x.notify () ... notifyAll x.notifyAll () wakes all threads waking threads must wait for the mutex, eventually let back in (maybe) T3 T2 } monitor associated with object ‘x’ wait set Java threads suffer from spurious wakeup when in the wait-set. caused by an underlying problem ( feature ) with the POSIX threads mechanism in some OSs ... CO538 Monitors in Java The code that calls ‘wait’ must use a try-catch block to catch something called InterruptedException , or throw it upwards. Wait-set is a set – i.e. unordered , not guaranteed to be fair. Entry to synchronized blocks is unordered too. Threads can hold multiple mutex locks (nested ‘synchronized’ blocks); when a thread ‘wait’s on one of the monitors, the associated lock is temporarily released. A thread may acquire the same mutex lock multiple times safely, but still a potential for deadlock from cyclic ordering of locks.
CO538 Java Concurrency Abstractions The low-level mechanisms of Java (threads and monitors) are not easy to work with. enter higher-level primitives (“java.util.concurrent”). This provides a range of concurrency classes that can be used to implement synchronous or asynchronous communication, barriers (multi-party synchronisation) and other useful application-level features. someone has to implement such things of course, and have a comprehensive understanding of how it all works at the low-level — computer scientists ..! At the end of the day, still have the potential for race-hazards between threads on shared data.. a big frustration is that such bugs are incredibly hard to pin down (and the subject of some interesting research). CO538 Other Concurrency Models Two other concurrency models which are worth considering are: that used by Polyphonic C # , also called C-omega . that used by concurrent Haskell with software transactional memory . gaining more popularity as it is more comprehensible to J. Random Programmer than locking strategies or wait-free/lock-free algorithms. Both of these are somewhat different from the occam- π model. Discovering which of the various concurrency abstractions are most useful , and which are most easily understood are left largely to the reader... We’ve mostly established that the threads-and-locks model doesn’t work. it may be easy to understand, but is hard to apply correctly and certainly does not scale well (at all).
Recommend
More recommend