Expressing and Verifying Probabilistic Assertions Adrian Sampson University of Washington Pavel Panchekha Todd Mytkowicz Microsoft Research Kathryn S. McKinley Dan Grossman University of Washington � � � � � � � � Luis Ceze � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � PLDI 2014 � �
Probabilistic assertions express correctness properties in modern software. Our verifier checks them e ffi ciently and accurately .
assert file != NULL t e s t check y f i r e v
e assert file != NULL e must hold on every execution
≈ Approximate Computing this approximate k-means clustering is image is close to likely to converge even its precise version on unreliable hardware assert e e Obfuscation for Mobile and Sensing Data Privacy sensor error does not obfuscated data is still render the app’s useful in aggregate conclusions useless
≈ Approximate Computing this approximate k-means clustering is image is close to likely to converge even its precise version on unreliable hardware assert e Traditional assertions are insu ffi cient e for programs with probabilistic behavior. Obfuscation for Mobile and Sensing Data Privacy sensor error does not obfuscated data is still render the app’s useful in aggregate conclusions useless
Assertions are insu ffi cient for private-data obfuscation true_avg = average(salaries) � private_avg = � average(obfuscate(salaries)) � assert true_avg - private_avg � <= 10,000
Assertions are insu ffi cient for private-data obfuscation true_avg = average(salaries) � private_avg = � average( (salaries)) � obfuscate assert true_avg - private_avg � <= 10,000 probability distribution
Assertion assert e
Probabilistic assertion assert e , p, c p
Probabilistic assertion assert e , p, c p e must hold with probability p at confidence c
Probabilistic assertion assert e , p, c p t e s t ? check? ? y f i r e v
How to verify a probabilistic assertion probabilistic float obfuscated( float n) { � return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � program total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � ? p_avg = ...; passert e, p, c }
How to verify a probabilistic assertion naively probabilistic float obfuscated( float n) { � return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � program total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � ? p_avg = ...; passert e, p, c }
How to verify a probabilistic assertion with statistical reasoning queries & inference passert for statistical models for probabilistic software Church Infer.NET ? [Sankaranarayanan+ PLDI 2013] [Hur+ PLDI 2014] ⋮
How to verify a probabilistic assertion e ffi ciently and accurately distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR
How to verify a probabilistic assertion e ffi ciently and accurately distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang
How to verify a probabilistic assertion e ffi ciently and accurately distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang
Distribution extraction: random draws are symbolic symbolic heap 4.2 a b = a + gaussian(0.0, 1.0) 4.2 a 4.2 + G 0,1 b
Concrete vs. symbolic semantics program + outputs input nondeterministic concrete execution
Concrete vs. symbolic semantics program + outputs input nondeterministic concrete execution program + outputs input deterministic nondeterministic symbolic execution sampling
a 4.2 b G 0,1 input: a = 4.2 � b = gaussian(0.0, 1.0)
a 4.2 b G 0,1 c input: a = 4.2 � + b = gaussian(0.0, 1.0) � c = a + b
a 4.2 b G 0,1 c input: a = 4.2 � + b = gaussian(0.0, 1.0) � c = a + b � d + d = c + b
a input: a = 4.2 � c 4.2 b = gaussian(0.0, 1.0) � + c = a + b � d + b d = c + b G 0,1
a input: a = 4.2 � c 4.2 b = gaussian(0.0, 1.0) � + c = a + b � d + b d = c + b � if b > 0.5 � G 0,1 e = 2.0 � > else � 0.5 e = 4.0 if then e 2.0 ? else 4.0
a input: a = 4.2 � c 4.2 b = gaussian(0.0, 1.0) � + c = a + b � d + b d = c + b � if b > 0.5 � G 0,1 e = 2.0 � 3.0 > else � 0.5 ≤ e = 4.0 � if passert e <= 3.0, � then e 2.0 ? 0.9, 0.9 else 4.0
input: a = 4.2 � 4.2 b = gaussian(0.0, 1.0) � + c = a + b � + d = c + b � if b > 0.5 � G 0,1 e = 2.0 � 3.0 > else � 0.5 ≤ e = 4.0 � if passert e <= 3.0, � then 2.0 ? 0.9, 0.9 else 4.0
input: a = unif(2.0, 9.0) input: a = 4.2 � 4.2 b = gaussian(0.0, 1.0) � + c = a + b � + d = c + b � if b > 0.5 � G 0,1 e = 2.0 � 3.0 > else � 0.5 ≤ e = 4.0 � if passert e <= 3.0, � then 2.0 ? 0.9, 0.9 else 4.0
concrete input input distribution salary = $24,000 salary = uniform(…) ≈ testing ≈ static analysis
More in the paper Arrays & pointers Loops External code Probabilistic path pruning
Distribution extraction produces an expression dag Bayesian network 4.2 + + > G 0,1 0.5
Distribution extraction produces an expression dag Bayesian network 4.2 G 0,1 + + 0.5 >
Distribution extraction produces an expression dag Bayesian network nodes: random variables edges: dependence 4.2 G 0,1 + random draws directed & acyclic only at leaves + 0.5 > sample in a single pass
distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang
statistical passert verifier property optimization
Bayesian-network IR enables new optimizations G ʹ G G ʹʹ + X ∼ G ( µ X , σ 2 X ) Y ∼ G ( µ Y , σ 2 Y ) Z = X + Y ⇒ Z ∼ G ( µ X + µ Y , σ 2 X + σ 2 Y )
Bayesian-network IR enables new optimizations U c U ʹ × X ∼ U ( a, b ) Y = cX ⇒ Y ∼ U ( ca, cb )
Bayesian-network IR enables new optimizations c U B ≤ X ∼ U ( a, b ) Y ∼ X ≤ c a ≤ c ≤ b ✓ c − a ◆ ⇒ Y ∼ B b − a
Central Limit Theorem collapses large sums D D D D D D D G + X 1 , X 2 , . . . , X n ∼ D X Y = X i i ⇒ Y ∼ G ( nµ D , n σ 2 D )
distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang
Verification via direct evaluation ✓ D D D D D D D B + c ≤
Verification via hypothesis testing p D G 0,1 3 , p, c μ + c ÷ D 2 >
distribution extraction verification via symbolic execution statistical optimizations float obfuscated( float n) { � ✓ return n + gaussian(0.0, 1000.0); � } � float average_salary( float* salaries) { � total = 0.0; � for ( int i = 0; i < COUNT; ++i) � total += obfuscated(salaries[i]); � avg = total / len (salaries); � p_avg = ...; passert e, p, c } Bayesian network IR implementation for LLVM & Clang
Probabilistic assertions for C and C++ LLVM LLVM Native .c IR IR Code strawman stress-tester
Recommend
More recommend