QuickCheck John Hughes
DEMO
Registry tests on multiple nodes 40000 35000 30000 Tests per minute 25000 20000 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Nodes Four core/8HT SkyLake i7 Dual core/4HT Ivy Bridge i7
Architecture Count tests, print dots, start and stop Master Worker Worker Worker Worker Generate and run tests
How can the Master count the tests? Master Worker Worker Worker Worker
How can the Master stop the workers? Master Worker Worker Worker Worker
Benchmark Bouncer Mirror 1.2 million/second
Multiple bouncers Bouncer Mirror Bouncer Mirror Bouncer Mirror
Message round-trips per second Four core/8 thread i7 4500000 Round trips per second 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 1 2 3 4 5 6 7 8 9 10 P--number of bouncers P bouncers, P mirrors
Multiple bouncers, one mirror Bouncer Bouncer Mirror Bouncer
Message round-trips per second Four core/8 thread i7 4500000 4000000 Round trips per second 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 1 2 3 4 5 6 7 8 9 10 P--number of bouncers P bouncers, P mirrors P bouncers, 1 mirror
Batch permissions? Master Worker Worker Worker Worker BUT this may delay termination!
Alternative Architecture Master Manager Manager Manager Manager Yes Another? Worker Worker Worker Worker
• Every worker • Stopping can be communicates with its slightly delayed own manager— scalable!
> eqc:quickcheck(examples:prop_reverse()). .................................................................................................... OK, passed 100 tests true > eqc:quickcheck(examples:prop_reverse()). ....................................................................................................(x10).(x1)......... OK, passed 119 tests true > eqc:quickcheck(eqc:in_parallel(examples:prop_reverse())). ....................................................................................................(x10)..(x1)...... OK, passed 126 tests true > eqc:quickcheck(eqc:on_nodes(examples:prop_reverse())). ....................................................................................................(x10)................................ ........................(x1)......... OK, passed 669 tests But how bad is it to run a few extra tests?
What about node placement? Master Manager Manager Manager Manager Yes Another? Worker Worker Worker Worker
What about node placement? Master Manager Manager Manager Manager Worker Worker Worker Worker
What about node placement? Master Manager Manager Manager Manager Yes Another? Worker Worker Worker Worker
P bouncers, 1 mirror, different nodes Four core/8 HT i7 160000 140000 120000 100000 80000 60000 40000 20000 0 1 2 3 4 5 6 7 8 9 10
P bouncers, 1 mirror Four core/8HT i7 1600000 1400000 1200000 1000000 10-30x slower from a 800000 different node! 600000 400000 200000 0 1 2 3 4 5 6 7 8 9 10 Different nodes Same node
P bouncers, 1 mirror Bouncers on dual core laptop, mirror on quad core 30000 25000 20000 15000 10000 5000 0 0 5 10 15 20 25 30 35
Bounces per second (single bouncer) 1400000 1200000 1000000 800000 600000 400000 200000 0 Same LAN Same Host Same Node
Bounces per second (single bouncer) 7 6 5 30x Log scale 4 50x 3 2 1 0 Same LAN Same Host Same Node
What about success messages? Master Manager Manager Manager Manager Success Worker Worker Worker Worker
Two-way vs one-way Bouncer Mirror 1.2 million/second 5.4 million/second
Bounces/Messages per second 8 4x 10x 7 100x 6 Log scale 5 4 3 2 1 0 Same LAN Same Host Same Node Two way One way
One more optimization… Send the total every There are a LOT of Master 100ms success messages Count them! Manager Manager Manager Manager Success Worker Worker Worker Worker
• Tripled the speed of • Stopping can be even quickcheck(true)! more delayed
Lessons • There is at least an order of magnitude difference between communication costs • Within a node • Between nodes • Between hosts • Latency is much worse affected than bandwidth • This affects design for performance • Favours asynchronous over synchronous communication between nodes • Optimising performance may require changes to observable behaviour • … and we didn’t even consider fault tolerance
Recommend
More recommend