CPL 2016, week 1 Java threads and inter-thread visibility Oleg Batrashev Institute of Computer Science, Tartu, Estonia February 10, 2016
Course information ◮ 1 lecture, 1 lab each week ◮ moving lecture/lab time? (Monday 10-16) ◮ Languages (poll) ◮ Java 40% ◮ Erlang 20% ◮ Clojure 40% ◮ Total score consists of ◮ weekly homeworks 20% in total ◮ two small projects, 20% each ◮ 40% written exam ◮ I create lecture notes where add comments to slides
Agenda Java threads Creating threads Measuring time Inter-thread visibility Example: infinite loop Hardware: invalid read/write order Example: read/write access ordering Hardware: cache subsystem Java memory model Example: faulty publication Double-Checked Locking Object safe publication Conclusions
Extend Thread class 1. Extend your class from the Thread class, 2. Create an object of your class, 3. Start your thread by using start() method on the object. This may be the simplest method, but it is not very flexible. Using Runnable interface, as shown next is preferred way. ◮ How many of you know this?
Implement Runnable interface Create a class: class Reader implements Runnable { @Override public void run () { // new thread starts execution here } } Create thread object and pass it new class object: Thread t = new Thread(new Reader ()); t.start ();
System.nanoTime() ◮ measures wall-time (elapsed time) ◮ as opposed to user time or CPU time ◮ valid only within the same JVM ◮ much better accuracy than currentTimeInMillis() Example: long mainStartTime = System.nanoTime (); while (i<N) i++; t.join (); System.out.println("Time difference " +( readerStartTime - mainStartTime )); ◮ join() waits for the other thread to finish, which makes its values available, e.g. readerStartTime
CountDownLatch CountDownLatch may be used to synchronize threads[3, sec 5.5.1] latch = new CountDownLatch (1); Thread t = new Thread(new Reader ()); t.start (); latch.await (); ◮ await() method suspends until latch is zero ◮ countDown() method decreases the latch value public void run () { latch.countDown (); try { latch.await (); } catch ( InterruptedException e) { throw new RuntimeException (e); } readerStartTime = System.nanoTime ();
Visibility problems Single-threaded programs ◮ always see the last written value of a variable x=5 ... y=2*x Multi-threaded programs ◮ can see variable values written from other threads, ◮ moreover, there is uncertainty in the order the changes are seen in the current thread. There are several reasons why such situation can happen: 1. CPU registers 2. CPU instructions executing out-of-order 3. Compiler re-ordering optimizations
Example code Variable i is visible to the Main and Reader threads: private static final int N = 50000000; private static /* volatile */ int i = 0; Main thread writes the variable in the loop: Thread t = new Thread(new Reader ()); t.start (); while (i<N) i++; System.out.println("Main finished"); Reader thread reads it in a loop and exits: int cnt =0; while (i<N) { cnt ++; } System.out.println("Count "+cnt );
Memory hierarchy The hierarchy consists of the following layers: ◮ Registers inside the core (CPU) – contain the values of instructions being executed; ◮ Dedicated cache (Level 1) – each core has its own; ◮ Shared cache (e.g. Level 2) – common for all cores; ◮ Main memory (RAM) – just your memory. The following are typical sizes and access delays: Core Registers several KBytes < 1ns Level 1 cache 30-100 KBytes 1 − 2ns Level 2 cache several MBytes 5 − 15ns Main memory (RAM) several GBytes 100 − 300ns
Memory read/write ordering When running machine code, CPU (core) may CPU ◮ execute it out-of-order, thus write/read memory in different order. Registers The latter is mostly eliminated by load/store buffers: Store buffer Load buffer 1. all reads from cache are program ordered; 2. all write to cache are program ordered; 3. reads may be re-ordered before writes to different memory location; 4. there are exceptions as described in [1] (vol. 3, sec. 8.2 “Memory ordering”); Cache 5. memory fence instruction[5] must be used to disallow moving reads before writes. Memory It is complex and platform dependent! Programmer needs something simpler to grasp!
Example code private static volatile boolean isRunning = true; private static /* volatile only in b case */ int x = 0; private static /* volatile only in c case */ int y = 0; Writer thread writes x and y in order to increasing values: for (int i=0; i<N; i++) { x = i; y = i; } isRunning = false; Reader thread reads x and y in the opposite order: int xl , yl , cnt =0; do { yl = y; xl = x; if (xl < yl) cnt ++; } while (isRunning ); cnt tells how many times x < y .
Example analysis and results If operations are not re-ordered: ◮ Writer thread always x first, so in memory x ≥ y ◮ Reader thread may get variables from different iterations of the Writer thread ◮ y is read first, thus it must be from earlier iteration, thus x > y It must therefore never be that x < y . Test results are: x < y count (several runs) Machine1 Machine2 Visibility2a 984, 7487, 1179 2781, 37, 182 Visibility2b 286, 21307, 1015 80975, 255, 80330 Visibility2c 0, 0, 0 0, 0, 0 For tests a and b most probable explanation: ◮ compiler re-ordering optimizations caused different read and/or write order Reasoning about program behavior of two threads is difficult! Unless there are certain guarantees, e.g. Java Memory Model.
Ordering semantics of volatile In Java program: ◮ writes to a volatile variable may not be re-ordered with earlier writes (possibly to different variables) ◮ reads from a volatile variable may not be re-ordered with the latter reads (possibly from different variables) In our example: x = i; y = i; ◮ if y is volatile, it may not be ordered before write to x ; ◮ if x is volatile, no guarantees are given, i.e. write to y may be done before write to x.
Common misconceptions Consider typical cache sybsystem: ◮ several levels with 64-byte granularity (cache line); ◮ when not found in Level 1, it is copied from Level 2; ◮ when not found in cache, it is copied from Main memory; ◮ when done, the cache line is copied back from Level 1 to Level 2. It is common to think that: 1. because Level 1 cache is dedicated to core, it is possible for two cores to have their own version of a line; 2. to make all changed values visible to other threads it is required to flush core Level 1 cache to at least Level 2 cache. In fact, on modern systems both are false . [4]
Cache coherency Modern cache subsystems maintain coherence protocol, typically: ◮ only one core may hold a line in write mode, exclusively; ◮ many cores may share a line in read-only mode; ◮ when one requests a line in write mode others must invalidate the line. With such protocols only one version of a line exists in the subsystem, either 1. shared (cloned) among cores in read-only mode 2. or exclusive to single core in writeable mode. This is simplified description, actual cache coherency protocols, like MESIF or MOESI, are more complex.
Happens-before in single thread In single thread actions execute as if by program order: ◮ statement A happens-before B if program code contains them in this order ◮ “as if” allows to re-order as long as the result is the same as with program (sequential) ordering ◮ in extreme, first assignment in x=5; x=y; may be discarded ◮ notice, this affects what values other threads may see Program ordering does absolutely no guarantees: ◮ what are the relations between actions in different threads, ◮ i.e. what changes and when are seen in other threads. It is unknown whether and when A effects will be visible in the second thread even if B is visible. A C D B
Happens-before between threads On modern cache subsystems we know which particular write to a volatile preceeds a given read from this volatile. JMM states that: ◮ write to a volatile variable happens-before subsequent read of that variable; ◮ happens-before relation is transitive: A → B and B → C implies that A → C ; ◮ thus, any statements before write to volatile happen-before statements after the read from the volatile; ◮ including reading/writing other variables ◮ again, compiler and CPU may optimize, but the results must be as if executed this way. Java compiler and JVM must provide this behavior for our program. This frees from thinking about low-level visibility details of the hardware.
JMM for Visibility2c example Remember ordering example where we make variable y volatile. ◮ Assume y is read from the second thread just after writer thread writes value 5; ◮ because y is volatile there is happens-before relation between these write and read x=5 y=5 yl=y xl=x ◮ consequently, x=5 happens-before xl=x , i.e. effects of the write must be visible to the Reader thread ◮ however, x may be 6 or 7, because no ( happens-before ) relations are defined for the subsequent writes ◮ they may or may not be visible to the Reader thread
Recommend
More recommend