Debugging Highly-Parallel Programs João M. Lourenço , José C. Cunha and Vitor Duarte CITI / Universidade Nova de Lisboa joao.lourenco@fct.unl.pt 1
Why do programs have errors? Problem Problem solved! Devise a Write a computational computer solution program
What about parallel programs? Problem Problem solved! Devise a Write a computational computer solution program Interleaving errors
All men are equal! What about errors? Yields a correct result, Non fail-stop Byzantine although it takes longer errors than acceptable Harder Performance Unwanted side effects Interleaving caused by non-reentrant code and shared data Synchronization Ordering failures Ordering Easier and deadlocks Sequential Violations of errors precedence or mutual exclusion relations
Parallel Multicore program system ? Expected Observed behavior behavior
Parallel computations Parallel program What? running Parallel Parallel generates computations computations How? Multicore system
Program histories • Local History : h i – sequence of events generated by executing the program “ p i ” – h i = e i 0 , e i 1 , …, e i f – k th event in h i (e i k ) produces the local state s k • Global History : H – union of the local histories of N processes – H = h 1 U h 2 U … U h N
Parallel Computation • A parallel computation is a partially ordered set ( poset ) defined as C D = (H, ) – H = Global history – = Lamport’s happens before relation
Cut of a parallel computation • A cut of a parallel computation is a subset C of its global history H that contains an initial prefix for each of its local histories x , h 2 y , …, h n z } C = {h 1 e 1 e 1 e 1 e 1 1 2 3 4 P 1 e 2 1 e 2 2 e 2 3 e 2 4 e 2 5 P 2 e 3 1 e 3 2 e 3 3 P 3
Frontier of a cut & Global state • The frontier of a cut is the set of the last states/events in a cut x , s 2 y , …, s n z } F = {s 1 • The frontier of a cut defines a global state e 1 1 e 1 2 e 1 3 e 1 4 P 1 e 2 1 e 2 2 e 2 3 e 2 4 e 2 5 P 2 e 3 1 e 3 2 e 3 3 P 3
Consistent cut • A cut is consistent if for all events in its frontier, all their past events are also included in the cut Inconsistent Consistent cut cut F 1 F 2 e 1 e 1 e 1 e 1 1 2 3 4 P 1 e 2 1 e 2 2 e 2 3 e 2 4 e 2 5 P 2 e 3 1 e 3 2 e 3 3 P 3
Consistent cut • A global state is consistent if it corresponds to the frontier of a consistent cut Inconsistent Consistent global state global state F 1 F 2 e 1 e 1 e 1 e 1 1 2 3 4 P 1 e 2 1 e 2 2 e 2 3 e 2 4 e 2 5 P 2 e 3 1 e 3 2 e 3 3 P 3
Runs 7 states 6 states 30 states 00 00 00 00 10 10 10 01 01 01 P 1 P 2 20 20 11 11 11 02 02 02 e 2 1 e 1 1 30 30 21 21 21 12 12 03 03 03 40 40 31 31 31 22 22 13 13 04 04 04 e 2 2 e 1 2 50 50 41 41 41 32 32 23 23 14 14 14 05 05 e 1 e 2 3 3 60 60 51 51 42 42 42 33 33 24 24 24 15 15 e 1 4 e 2 4 61 61 52 52 43 43 43 34 34 34 25 25 e 1 5 53 53 53 44 44 35 35 35 62 62 e 2 5 63 63 63 54 54 45 45 45 e 1 6 64 64 64 55 55 55 65 65 65 65
Observing a parallel program Consistent Observation observation internal & interaction permutation Process (P 1 ) events Events / Local Consistent Process (P 2 ) Run / states histories run union Process (P N ) arbitrary total order Global history casual precedence constraints subset parallel Consistent Frontier of a Cut computation cut consistent cut
Observing a parallel program Consistent Developer perspective Observation observation internal & interaction permutation Process (P 1 ) events Events / Local Consistent Process (P 2 ) Run Program execution / states histories run union Process (P N ) arbitrary total order Global history casual precedence constraints subset parallel Consistent Frontier of a Cut Program state computation cut consistent cut
Observing and debugging interactive debugging state based debugging of remote processes observation of program states to obtain reproducible behavior trace, replay deterministic re-execution and debugging repeatable observations to analyze alternative paths combined testing, systematic state exploration steering and debugging alternative observations to evaluate correctness properties global predicate global program properties detection observation of consistency
The scaling challenge • How to deal with hundreds (or thousands) of threads? – Collect, store and gather observations / logs • What shall be the detail level? • Logs may be huge – Combining the logs – Reason about global observations • Visualize large amounts of information • Evaluate global predicates on the program state • Evaluate global predicates on the program run – Map observation points to the original program • Dealing with code-generators • Supporting high-level abstractions, DSLs
The End… Happy debugging!
Recommend
More recommend