model checking contest results for 2016
play

Model Checking Contest results for 2016 Fabrice Kordon LIP6, Univ. - PowerPoint PPT Presentation

Model Checking Contest results for 2016 Fabrice Kordon LIP6, Univ. P. & M. Curie, France Hubert Garavel Inria/LIG, France Lom Messan Hillah LIP6 & Univ. Paris Ouest Nanterre, France Francis Hulin-Hubard LSV, CNRS/ENS de


  1. Model Checking Contest results for 2016 Fabrice Kordon — LIP6, Univ. P. & M. Curie, France Hubert Garavel — Inria/LIG, France Lom Messan Hillah — LIP6 & Univ. Paris Ouest Nanterre, France Francis Hulin-Hubard — LSV, CNRS/ENS de Cachan, France Emmanuel Paviot-Adet — LIP6 & Univ. Paris Descartes, France Loïg Jézequel, IRCCyN, Univ. Nantes, France César Rodrígez — LIPN, Univ. Paris 13, France

  2. Objectives 2 Promoting model checking tools Compare and debug ‣ Oracle handled by the developers themselves Enhance reproducibility of results ‣ BenchKit + dedicated environment using virtualization (easier replay) F. Kordon - Université P. & M. Curie - CC2016 ‣ Submissions available online Encourage tools and tool support ‣ Observatory for the community ‣ Provide reusable and fair comparison charts and data Creating a common database of benchmark Models from various origins (more to tell later) ‣ PNML is a good format for this Competing tools not only dedicated to Petri nets Tools coming from other communities

  3. Model Checking Contest — who does what? 3 M a n a g i n g 
 Lom Hillah Hubert Garavel M o d (UPOND) (Inria) e l s g 
 n i g a n a M F. Kordon - Université P. & M. Curie - CC2016 + n o Francis Hulin-Hubard Fabrice Kordon i t u c e x E (CNRS) (UPMC) s s i y a l n a Loïg Jezequel César Rodríguez (U. Nantes) (UP13) Emmanuel Paviot-Adet (UP5) g 
 n i g a n a M s a l u m r o F

  4. Tools Submitted this Year 4 ITS-Tools pnmc Univ. P. & M. Curie, F Steery.io, F LoLA PNXDD Univ. Rostock, D Univ. P. & M. Curie, F F. Kordon - Université P. & M. Curie - CC2016 LTSMin Smart (new) Univ. Twente, NL Iowa State Univ, USA MARCIE tapaal Univ. Cottbus, D Univ. Aalborg, DK ‣ 3 variants (PAR, SEQ, EXP) PeCan (new) ydd-pt (new) Univ. HoChiMinh, VN Univ. Geneva, CH

  5. Tools Submitted this Year 4 ITS-Tools pnmc Univ. P. & M. Curie, F Steery.io, F Not present this year LoLA PNXDD Cunf, GreatSPN, StraTAGem Univ. Rostock, D Univ. P. & M. Curie, F F. Kordon - Université P. & M. Curie - CC2016 LTSMin Smart (new) Univ. Twente, NL Iowa State Univ, USA MARCIE tapaal Univ. Cottbus, D Univ. Aalborg, DK All VMs will be published ‣ 3 variants (PAR, SEQ, EXP) PeCan (new) R e p r o d u c i b i l i t y c a n b e a c ydd-pt (new) h i e v e d Univ. HoChiMinh, VN Univ. Geneva, CH

  6. Techniques Reported by Tools 5 Tools Techniques parallelism SEQUENTIAL_PROCESSING DECISION_DIAGRAMS Marcie / UNFOLDING_TO_PT PeCan / EXPLICIT pnmc / DECISION_DIAGRAMS USE_NUPN F. Kordon - Université P. & M. Curie - CC2016 PNXDD / DECISION_DIAGRAMS TOPOLOGICAL Smart / DECISION_DIAGRAMS EXPLICIT STRUCTURAL_REDUCTION STATE_COMPRESSION tapaal(EXP) / STATE_EQUATIONS tapaal(SEQ) / EXPLICIT STRUCTURAL_REDUCTION STATE_EQUATIONS ydd-pt / DECISION_DIAGRAMS DECISION_DIAGRAMS SAT_SMT 
 ITS-Tools MC INITIAL_STATE TOPOLOGICAL USE_NUPN PARALLEL_PROCESSING EXPLICIT SAT_SMT 
 LoLA MC STATE_COMPRESSION STUBBORN_SETS TOPOLOGICAL DECISION_DIAGRAMS EXPLICIT 
 LTSMin PAR STATIC_VARIABLE_REORDERING USE_NUPN EXPLICIT COMPRESSION STRUCTURAL_REDUCTION tapaal(PAR) PAR STATE_EQUATIONS

  7. Processing Capacity 6 bluewhale03 Ebro Quadhexa-2 Small (cluster) Total 11x24 @ Cores 40 @ 2.8GHz 64 @ 2.7GHz 24 @ 2.66GHz - 2.4GHz Memory (GB) 512 1024 128 11x64 - F. Kordon - Université P. & M. Curie - CC2016 Used Cores (1 per VM) 31 63 7 11x3, - for sequential tools 31 VM in // 63 VM in // 7 VM in // 5x3 VM in // Used Cores (4 per VM) 36, 60, 20, 11x3, - for parallel tools 9 VM in // 15 VM in // 5 VM in // 5x3 VM in // Number of runs 13 374 36 936 15 768 62 604 128 682 156d, 17h, 485d, 19h, 203d, 0h, 636d, 9h, 1481d, 22h, Total CPU required 44m, 59s 27m, 43s 25m, 47s 11m, 36s 50m, 5s - Total CPU about 4 years and 20 days Time spent to complete about 22 days and 1 hours - benchmarks VM boot time of VMs + 22 d, 8h (Included in total CPU) - management (overhead)

  8. Processing Capacity 6 bluewhale03 Ebro Quadhexa-2 Small (cluster) Total 11x24 @ Less CPU than in 2015 Cores 40 @ 2.8GHz 64 @ 2.7GHz 24 @ 2.66GHz - 2.4GHz 1 2 8 6 8 2 r u n s i n s t e a d Memory (GB) 512 1024 o 128 11x64 - f 1 6 9 0 7 8 b u t m o r e c o m p l e t e d r u n s F. Kordon - Université P. & M. Curie - CC2016 Used Cores (1 per VM) 31 63 7 11x3, - for sequential tools 31 VM in // 63 VM in // 7 VM in // 5x3 VM in // Used Cores (4 per VM) 36, 60, 20, 11x3, - for parallel tools 9 VM in // 15 VM in // 5 VM in // 5x3 VM in // Number of runs 13 374 36 936 15 768 62 604 128 682 Thank you very much 156d, 17h, 485d, 19h, 203d, 0h, 636d, 9h, 1481d, 22h, Université de Genève Total CPU required 44m, 59s 27m, 43s 25m, 47s 11m, 36s 50m, 5s Rostock University - Total CPU about 4 years and 20 days Université Paris Ouest Université P. & M. Curie Time spent to complete about 22 days and 1 hours - benchmarks VM boot time of VMs + 22 d, 8h (Included in total CPU) - management (overhead)

  9. Categories of Models 7 «known» models Those from past years ‣ Test the tool as used by its developers «Stripped» models F. Kordon - Université P. & M. Curie - CC2016 «known» (original archive) and set as «surprise» ones ‣ Test the tool as used by «non experts» of the tool «Surprise» models New models proposed by the community this year ‣ Test the tool as used by «non experts» of the tool ‣ new situations for the tool

  10. Categories of Models 7 «known» models Coefficients (after pool) Those from past years «known» = x1 ‣ Test the tool as used by its developers «stripped» = x3 «surprise» = x5 «Stripped» models F. Kordon - Université P. & M. Curie - CC2016 «known» (original archive) and set as «surprise» ones ‣ Test the tool as used by «non experts» of the tool Execution consistency «Surprise» models On the same machine New models proposed by the community this year «known» / «stripped» ‣ Test the tool as used by «non experts» of the tool colored + associated P/T ‣ new situations for the tool

  11. 11 New Models for 2016 8 B. Barbot F. Kordon PaceMaker AirplaneLD B. Barbot and 
 G. Salaün M. Kwiatkowska CloudDeployment F. Kordon - Université P. & M. Curie - CC2016 DNAwalker W. Serwe and H. Garavel H. Evrard and F. Lang DES DLCshifumi T. Shmeleva M. Heiner TriangularGrid GPPP D. Zaistev F. Jebali and E. Jenn HypertorusGrid AutoFlight TCPcondis

  12. 11 New Models for 2016 8 B. Barbot F. Kordon With scaling parameters PaceMaker AirplaneLD 139 models in fact B. Barbot and 
 G. Salaün M. Kwiatkowska CloudDeployment F. Kordon - Université P. & M. Curie - CC2016 DNAwalker W. Serwe and H. Garavel Thanks!!! H. Evrard and F. Lang We really need various models DES DLCshifumi T. Shmeleva M. Heiner TriangularGrid Already from past years GPPP D. Zaistev F. Jebali and E. Jenn 525 instances of models HypertorusGrid AutoFlight TCPcondis

  13. Examinations 9 StateSpace UpperBound Reachability ReachabilityDeadlock F. Kordon - Université P. & M. Curie - CC2016 ReachabilityCardinality ➝ atomic propositions refer to tokens ReachabilityFireability ➝ atomic propositions refer to firing CTL CTLCardinality ➝ atomic propositions refer to tokens CTLFireability ➝ atomic propositions refer to firing LTL LTLCardinality ➝ atomic propositions refer to tokens LTLFireability ➝ atomic propositions refer to firing

  14. The Submission Protocol 10 May 1st, delivery of disk images Qualification phase Completed by mid May ‣ ~37 500 test runs F. Kordon - Université P. & M. Curie - CC2016 May 17, starting to operate tools 128 682 runs distributed over 4 different machines over Europe VM with 4 cores / 16GB ‣ ITS-Tools, LTSMin, TAPAAL(PAR), LoLa WM with 1 core / 16 GB ‣ Marcie, PeCan, pnmc, PNXDD Tapaal (SEQ, EXP), ydd-pt Time confinement, 1h

  15. The Analysis Protocol 11 Mid June, consolidation + analysis of outcomes 31 GByte of logs and CSV files ‣ Post analysis = ~18KLOC Ada + ~800 LOC bash Analysis Protocol F. Kordon - Université P. & M. Curie - CC2016 Pass 1, computing results for the majority in a «line» ‣ All tools for an examination for a model instance Pass 2, evaluating tool reliability ‣ Only considering values with a large majority Pass 3, reconstructing the results using tool reliability ‣ Help to decide when only 2 different answers ‣ A result must be of confidentiality 0.93 or more (0.9 in 2015) ‣ Some results are tagged «insecure» Pass 4 computing scores ‣ «insecure» results not considered when counting points

Recommend


More recommend