Automated Testing and Debugging of SAT and QBF Solvers - JKU

2 downloads 0 Views 135KB Size Report
Jul 13, 2010 - MiraXTv3, MXC, PicoSAT, RSat, Sat7, SAT4J, Spear, Tinisat. Robert Brummayer, Florian Lonsing and Armin Biere. Automated Testing and ...
Introduction Fuzz Testing Delta Debugging

Automated Testing and Debugging of SAT and QBF Solvers Robert Brummayer, Florian Lonsing and Armin Biere JKU Linz, Austria 13th International Conference on Theory and Applications of Satisfiability Testing July 13, 2010 Edinburgh, Scotland, UK

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Outline Introduction Motivation Fuzz Testing Introduction Techniques for SAT Techniques for QBF Delta Debugging Introduction Techniques for SAT Techniques for QBF

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Motivation

Motivation (1/2) I

SAT and QBF solvers used as core decision engines I

I

Verification, automatic test generation, scheduling, model checking, SMT solvers, . . .

Main requirements I

Correctness I I

I

Robustness I

I

I

Satisfiability status Model resp. unsatisfiability proof No crashes when run on syntactically valid inputs

Speed

Clients heavily depend on these criteria I I I

Incorrect solver may lead to incorrect overall result Crashing solver may crash the client as well Solver should compute the result as fast as possible I

High user expectations although problem is NP-complete

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Motivation

Motivation (2/2) I

Demand for fast solvers led to various improvements I

Improved algorithms I I

I

Low-level implementation and optimization details I I

I I

Error prone engineering Developer secrets

How do we make sure that our implementations are correct? Traditional testing approaches I

Unit testing / regression testing

I

Testing solver against benchmark suite

I

I

I

Clause learning, rapid restarts, literal watching, · · · Well understood from a theoretical point of view

Tedious task of generating test cases manually Limited set of highly specific benchmarks

Complement traditional testing with fuzz testing

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Fuzz Testing I

Fuzz Testing (Miller et al.) I

I

Grammar-based black-box fuzz testing I I

I

I

Random instances must be syntactically valid Ideally, we want high diversity to cover many corner cases

Repeatedly “attack” solver with random instances I I

I

Treat solver as a black-box However, use domain knowledge to implement fuzzer

Fuzzer: random instance generator I

I

automated negative testing technique

Check satisfiability result Check model resp. unsatisfiability proof

High throughput as a success factor I I

Random instances should not be “too” hard to solve However, trivial instances are unlikely to reveal critical defects

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Random 3SAT I

Random 3SAT fuzzer 3SATGen

I

Pick number of variables m

I

Pick clause variable ratio r Generate m · r random ternary clauses

I

I I I

I I

Pick each variable uniformly and negate it with probability 1/2 Avoid generating trivial clauses Avoid picking the same literal multiple times within a clause

Trivial to implement However, random 3SAT instances lack structure I

Generated instances are unlikely to reveal interesting defects

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Random Layered CNF I

Use domain knowledge I I

Often, SAT solvers use input structure Layered CNF fuzzer CNFuzz

I

Pick a number of layers l of maximum width w

I

Each i th layer introduces f fresh variables

I

Each layer has a separate ratio r from which the number of clauses is computed

I

Clauses are at least ternary I I

Probability of larger clauses decreases exponentially Probability of picking variables from previous layers decreases exponentially

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Random CNF by Circuit Translation I

Use domain knowledge I

I

Industrial SAT solvers are optimized for CNFs generated by circuit translation , e.g. Tseitin translation Fuzzer that builds circuit and translates it to CNF: FuzzSAT

I

Generate v boolean variables and insert them into set n

I

Pick op ∈ {AND, OR, XOR, IFF}, pick o1 and o2 from n, negate both with probability 1/2, generate new operator node and insert it into n

I

Repeat until each input variable is referenced at least t times

I

Combine roots to one boolean root

I

Translate root to CNF by Tseitin translation

I

Finally, add random clauses of varying size to increase diversity and difficulty

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Experiments for SAT Solvers from SAT-COMP’07 solver Blogic Blogic-fixed March ks MiraXTv3 PicoSAT RSat

err 0 0 24 26 0 56

3SATGen inc mod 0 0 0 0 2 0 7 0 0 0 0 0

err 1 0 5 91 0 27

CNFuzz inc mod 3 1 1 1 0 0 13 0 0 0 0 0

I

err = error, e.g. segfault

I

inc = incorrect satisfiability status

I

mod = invalid model Tested solvers

I

I

err 1 0 2 286 0 3

FuzzSAT inc mod 0 1 0 0 2 0 2 0 2 0 0 0

Barcelogic, Barcelogic-fixed, CMUSAT, March ks, MiniSat, MiraXTv3, MXC, PicoSAT, RSat, Sat7, SAT4J, Spear, Tinisat.

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Experiments for SAT Solvers from SAT-COMP’09 solver ManySat March hi MiniSat-9z

3SATGen err mod 2 0 0 0 2 0

I

err = error, e.g. segfault

I

mod = invalid model

I

Tested solvers I

CNFuzz err mod 56 0 0 0 58 0

FuzzSAT err mod 836 0 0 24 852 0

CirCUs, Clasp, Cumr p, Glucose, LySATi, ManySAT, Marchi hi, MiniSat, MiniSat-9z, MXC, PicoSAT, PrecoSAT, RSat, SApperloT, SAT4J, Varsat-industrial

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Random QBF model

I

Fuzzer that uses theoretical model by Chen and Interian to generate random QBFs: BlocksQBF

I

Model bears similarities to random SAT I

study threshold behavior

I

All clauses are for-all reduced by construction

I

All clauses have the same length

I

Each clause is free of complementary and duplicate literals

I

Duplicate clauses are discarded

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

QBFuzz I I I

Observation: exact model limits diversity Configurable fuzzer QBFuzz First, a quantifier prefix is generated I

Number of variables is picked for each block

I

Clauses are of varying length

I

Literals are selected from any block I

Complementary and duplicate literals are discarded

I

For-all reduction

I

Discard duplicate clauses

I

Eliminate unused variables

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Fuzz Testing Experiments

solver Quantor QuBE 6.0 QuBE 6.5 sKizzo SQBF yQuaffle I

error 0 0 0 0 0 0

BlocksQBF incorrect 0 684 0 0 0 0

error 1 5 4 2 35 94

QBFuzz incorrect 0 7 0 29 0 0

Tested solvers I

DepQBF, MiniQBF-090608, QMRES, Quantor-3.0, QuBE6.0, QuBE6.5, QuBE6.6, Semprop-010604, sKizzo-0.8.2, SQBF-1.0, Squolem-1.03, yQuaffle-021006

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Delta Debugging I

Delta Debugging (Zeller et al.) I

I

Technique to automatically shrink failure-inducing instances

Delta debugger (DD) runs solver on failure-inducing instance φ to obtain golden exit code

I

DD tries to simplify the failure-inducing instance greedily

I

DD calls solver with a simplified instance φ0 I I

I

Exit code = golden exit code, success , continue simplifying φ0 Exit code = 6 golden exit code, failure , try other simplification

Use wrapper script instead of calling solver directly I

Script determines if the observable behavior is equal I

For example, grep for a specific error message

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

SAT Delta Debugging I

Fuzz testing and Delta Debugging has been shown to be effective for SMT I

I

I I

Delta debugger cnfdd handles instance as set of clauses Uses an even greedier version of DDMIN (Zeller et al.) I

I

Delta debugging for SMT uses the explicit structure of the instance (hierarchical delta debugging) However, SAT instances are flat , i.e. structure is implicit

Greedier version reduces the number of calls to SAT solver

Delta debugger executes 2 alternating phases I I

With increasing granularity, try to remove sets of clauses Try to remove individual literals I

I

Rather costly, but necessary for sufficient overall reduction

Delta Debugging is run until fix-point is reached I

Or some threshold is reached, .e.g. time limit

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

Multi-threaded SAT Delta Debugging I

Delta Debugging algorithm offers parallelism

I

Multi-threaded SAT delta debugger mtcnfdd

I

Simplification attempts can be done in parallel Clause removal phase

I

I I

Divide set of clauses into g sets, where g is current granularity Try to remove individual sets in parallel I

I

Merge successful simplifications after each round I

I

Independent parallel calls to SAT solver Start with smallest reduced formula

Literal removal phase I I

Clauses are split among threads However, successful literal removals are merged immediately

Robert Brummayer, Florian Lonsing and Armin Biere

Automated Testing and Debugging of SAT and QBF Solvers

Introduction Fuzz Testing Delta Debugging

Introduction Techniques for SAT Techniques for QBF

SAT Delta Debugging Experiments solver Blogic Blogic-fix March hi March ks MiniSat-9z PicoSAT RSat

files 7 2 24 35 912 2 86

c 4 2 1 3 1 1 2

time 39 41 638 4

Suggest Documents