Understanding Random SAT Understanding Random SAT Beyond the Clauses-to-Variables Ratio Eugene Nudelman Eugene Nudelman Stanford University joint work with … Kevin Leyton-Brown Kevin Leyton-Brown Holger Hoos Holger Hoos University of British Columbia Alex Devkar Alex Devkar Yoav Shoham Yoav Shoham Stanford University
Introduction Introduction • SAT is one of the most studied most studied problems in CS • Lots known about its worst-case worst-case complexity – But often, particular instances of NP -hard problems like SAT are easy in practice easy in practice • “ Drosophila ” for average-case average-case and empirical empirical (typical-case) complexity studies • (Uniformly) random SAT provides a way to bridge analytical and empirical work CP 2004
Previously Previously … • Easy-hard-less Easy-hard-less hard hard transitions discovered in the behaviour of DPLL-type solvers [Selman, Mitchell, Levesque] – Strongly correlated with phase transition in solvability – Spawned a new enthusiasm for using empirical methods to study algorithm performance 2 1.5 1 0.5 0 • Follow up included study of: -0.5 -1 4 * Pr(SAT) - 2 -1.5 log(Kcnfs runtime) – Islands of tractability [Kolaitis et. al.] -2 3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3 c / / v – SLS search space topologies [Frank et.al., Hoos] – Backbones [Monasson et.al., Walsh and Slaney] – Backdoors [Williams et. al.] – Random restarts [Gomes et. al.] – Restart policies [Horvitz et.al, Ruan et.al.] – … CP 2004
Empirical Hardness Models Empirical Hardness Models • We proposed building regression models regression models as a disciplined way of predicting and studying algorithms ’ behaviour [Leyton-Brown, Nudelman, Shoham, CP-02] • Applications Applications of this machine learning approach: 1) Predict running time � Useful to know how long how long an algorithm will run 2) Gain theoretical understanding � Which variables are important important to the hardness model? 3) Build algorithm portfolios � Can select the right algorithm on a per-instance per-instance basis 4) Tune distributions for hardness � Can generate harder harder benchmarks by rejecting easy instances CP 2004
Outline Outline • Features Features • Experimental Results – Variable Size Data – Fixed Size Data CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses Short Plateau 1000 BEST # Unsat 800 600 400 Long Plateau 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 Best Solution (mean, CV) 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 Number of Steps to Optimal 400 (mean, median, CV, 10%.90%) 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses Ave. Improvement To 1000 Best Per Step BEST # Unsat (mean, CV) 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BEST # Unsat 800 600 First LM Ratio 400 (mean, CV) 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: Local Search Probing Features: Local Search Probing 1200 lauses BEST # Unsat Clauses 1000 BestCV BEST # Unsat (CV of Local Minima) 800 (mean, CV) 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Number Step N mber CP 2004
Features: DPLL, LP Features: DPLL, LP • DPLL DPLL search space size estimate – Random probing Random probing with unit propagation – Compute mean depth till contradiction – Estimate log(#nodes) • Cumulative number of unit propagations unit propagations at different depths (DPLL with Satz heuristic) • LP relaxation LP relaxation – Objective value – stats of integer slacks – #vars set to an integer CP 2004
Other Features Other Features • Problem Size Problem Size: Var Clause – v (#vars) } used for normalizing Var – c (#clauses) many other features Clause Powers of c / v , v / c , | c / v — 4.26 | – Var • Graphs: Graphs – Va Variable-Clause riable-Clause (VCG, bipartite) Var – Variable Variable (VG, edge whenever two Var variables occur in the same clause) Var – Clause Clause (CG, edge iff two clauses share a variable with opposite sign) Var Var • Balance Balance – #pos vs. #neg literals Clause Clause – unary, binary, ternary clauses • Proximity to Horn formula Horn formula Clause Clause CP 2004
Outline Outline • Features Features • Experimental Results Experimental Results – Variable Size Data – Fixed Size Data CP 2004
Experimental Setup Experimental Setup • Uniform random 3-SAT, 400 vars • Datasets Datasets (20000 instances each) – Variable-ratio Variable-ratio dataset (1 CPU-month) • c / v uniform in [3.26, 5.26] ( ∴ c ∈ [1304,2104]) – Fixed-ratio Fixed-ratio dataset (4 CPU-months) • c / v =4.26 ( ∴ v =400, c =1704) • Solvers Solvers – Kcnfs [Dubois and Dequen] – OKsolver [Kullmann] – Satz [Chu Min Li] • Quadratic Quadratic regression egression with logistic response function • Training : test : validation split – 70 : 15 : 15 CP 2004
Kcnfs Data 2 1.5 1 0.5 0 -0.5 -1 4 * Pr(SAT) - 2 -1.5 log(Kcnfs runtime) -2 3.3 3.5 3.7 3.9 4.1 4.3 4.5 4.7 4.9 5.1 5.3 c / / v CP 2004
Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004
Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004
Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004
Kcnfs Kcnfs Data Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004
Variable Ratio Prediction (Kcnfs) Variable Ratio Prediction (Kcnfs) 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004
Variable Ratio - Variable Ratio - UNSAT UNSAT 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004
Variable Ratio - Variable Ratio - SAT AT 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004
Kcnfs Kcnfs vs. Satz vs. Satz (UNSAT) (UNSAT) 1000 100 Satz time [CPU sec] 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Kcnfs time [CPU sec] CP 2004
Kcnfs Kcnfs vs. Satz vs. Satz (SAT) (SAT) 1000 100 Satz time [CPU sec] 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Kcnfs time [CPU sec] CP 2004
Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 | 100 | c / v -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean 53 | c / v -4.26 | × SapsBestCVMean 33 CP 2004
Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 -4.26 | 100 | c / v -4.26 -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean 53 | c / v -4.26 -4.26 | × SapsBestCVMean 33 CP 2004
Feature Importance Feature Importance – Variable Ratio Variable Ratio • Subset selection Subset selection can be used to identify features sufficient sufficient for approximating full model performance • Other (correlated) sets could potentially achieve similar performance Cost of Cost of Variable Variable Omission Omission | c / v -4.26 | 100 | c / v -4.26 | 2 69 ( v / c ) 2 × SapsBestCVMean SapsBestCVMean 53 | c / v -4.26 | × SapsBestCVMean 33 SapsBestCVMean CP 2004
Fixed Ratio Data Fixed Ratio Data 1000 100 Runtime(s) 10 1 0.1 0.01 3.26 3.76 4.26 4.76 5.26 Clauses-to-Variables Ratio CP 2004
Fixed Ratio Prediction (Kcnfs) Fixed Ratio Prediction (Kcnfs) 1000 Predicted Runtime [CPU sec] 100 10 1 0.1 0.01 0.01 0.1 1 10 100 1000 Actual Runtime [CPU sec] CP 2004
Feature Importance Feature Importance – Fixed Ratio Fixed Ratio Cost of Cost of Variable Variable Omission Omission SapsBestSolMean 2 100 SapsBestSolMean × MeanDPLLDepth 74 GsatBestSolCV × MeanDPLLDepth 21 VCGClauseMean × GsatFirstLMRatioMean 9 CP 2004
Recommend
More recommend