Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub 1
Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 2
Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) If P is sensitive and Q is sensitive, then Q ◦ P is sensitive 2
Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 3
Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) What distance d out should we take? 3
Our contributions • Coupling-based definition of probabilistic sensitivity • Relational program logic E pRHL • Formalized examples: stability and convergence 4
What is a good definition of probabilistic sensitivity? 5
One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | 6
One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | k -Uniform sensitivity ◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6
Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7
Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution Consequence of k -uniform sensitivity ◮ As number of iterations T increases, prove k -uniform sensitivity for larger and larger k ( T ) ◮ Relation between k and T describes speed of convergence 7
Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | 8
Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | k -Mean sensitivity ◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8
Application: algorithmic stability Machine learning algorithm A ◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized) Danger: overfitting ◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9
Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much 10
Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much L should be mean sensitive 10
Wanted: a general definition that is ... • Expressive • Easy to reason about 11
Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and 12
Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and Typical pattern Prove property about two (output) distributions by constructing a coupling with certain properties 12
Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + 13
Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13
Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13
Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings Typical pattern Bound distance d # between two (output) distributions by constructing a coupling with small average distance d 13
Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B 14
Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B We say f is ( d in , d out ) -expected sensitive if: d # out ( f ( a 1 ) , f ( a 2 )) ≤ d in ( a 1 , a 2 ) for all inputs a 1 , a 2 ∈ A . 14
Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive 15
Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive If outputs are real-valued and d out ( b 1 , b 2 ) = k · | b 1 − b 2 | : ( d in , d out ) -expected sensitive = ⇒ k -mean sensitive 15
Benefits: Easy to reason about 16
Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive 16
Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive 16
Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ 16
Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ Abstract away distributions ◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16
How to verify this property? The program logic E pRHL 17
A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18
A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18
A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ Judgments ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } ◮ Tagged program variables: x � 1 � , x � 2 � ◮ P and Q : boolean predicates over tagged variables ◮ d in and d out : real-valued expressions over tagged variables 18
E pRHL judgments model e xpected sensitivity A judgment ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } is valid if: for all input memories ( m 1 , m 2 ) satisfying pre-condition P , there exists a coupling of outputs ([ ] m 2 ) with [ c 1 ] ] m 1 , [ [ c 2 ] ◮ support satisfying post-condition Q ◮ E [ d out ] ≤ d in ( m 1 , m 2 ) 19
One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20
One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20
One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20
One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20
Recommend
More recommend