proving expected sensitivity of probabilistic programs

Proving Expected Sensitivity of Probabilistic Programs Gilles - PowerPoint PPT Presentation

Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grgoire Justin Hsu Pierre-Yves Strub 1 Program Sensitivity Similar inputs similar outputs Given: distances d in on inputs, d out on

  1. Proving Expected Sensitivity of Probabilistic Programs Gilles Barthe Thomas Espitau Benjamin Grégoire Justin Hsu Pierre-Yves Strub 1

  2. Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 2

  3. Program Sensitivity Similar inputs → similar outputs ◮ Given: distances d in on inputs, d out on outputs ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) If P is sensitive and Q is sensitive, then Q ◦ P is sensitive 2

  4. Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) 3

  5. Probabilistic Program Sensitivity? Similar inputs → similar output distributions ◮ Given: distances d in on inputs, d out on output distributions ◮ Want: for all inputs in 1 , in 2 , d out ( P ( in 1 ) , P ( in 2 )) ≤ d in ( in 1 , in 2 ) What distance d out should we take? 3

  6. Our contributions • Coupling-based definition of probabilistic sensitivity • Relational program logic E pRHL • Formalized examples: stability and convergence 4

  7. What is a good definition of probabilistic sensitivity? 5

  8. One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | 6

  9. One possible definition: output distributions close For two distributions µ 1 , µ 2 over a set A : d out ( µ 1 , µ 2 ) � k · max E ⊆ A | µ 1 ( E ) − µ 2 ( E ) | k -Uniform sensitivity ◮ Larger k → closer output distributions ◮ Strong guarantee: probabilities close for all sets of outputs 6

  10. Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution 7

  11. Application: probabilistic convergence/mixing Probabilistic program forgets initial state ◮ Given: probabilistic loop, two different input states ◮ Want: state distributions converge to same distribution Consequence of k -uniform sensitivity ◮ As number of iterations T increases, prove k -uniform sensitivity for larger and larger k ( T ) ◮ Relation between k and T describes speed of convergence 7

  12. Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | 8

  13. Another possible definition: average outputs close For two distributions µ 1 , µ 2 over real numbers: d out ( µ 1 , µ 2 ) � k · | E [ µ 1 ] − E [ µ 2 ] | k -Mean sensitivity ◮ Larger k → closer averages ◮ Weaker guarantee than uniform sensitivity 8

  14. Application: algorithmic stability Machine learning algorithm A ◮ Input: set S of training examples ◮ Output: list of numeric parameters (randomized) Danger: overfitting ◮ Output parameters depend too much on training set S ◮ Low error on training set, high error on new examples 9

  15. Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much 10

  16. Application: algorithmic stability One way to prevent overfitting ◮ L maps S to average error of randomized learning algorithm A ◮ If | L ( S ) − L ( S ′ ) | is small for all training sets S, S ′ differing in a single example, then A does not overfit too much L should be mean sensitive 10

  17. Wanted: a general definition that is ... • Expressive • Easy to reason about 11

  18. Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and 12

  19. Ingredient #1 : Probabilistic coupling A coupling models two distributions with one distribution Given two distributions µ 1 , µ 2 ∈ Distr ( A ) , a joint distribution µ ∈ Distr ( A × A ) is a coupling if π 1 ( µ ) = µ 1 π 2 ( µ ) = µ 2 and Typical pattern Prove property about two (output) distributions by constructing a coupling with certain properties 12

  20. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + 13

  21. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13

  22. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings 13

  23. Ingredient #2 : Lift distance on outputs Given: ◮ Two distributions µ 1 , µ 2 ∈ Distr ( A ) ◮ Ground distance d : A × A → R + Define distance on distributions: d # ( µ 1 , µ 2 ) � min E µ [ d ] µ ∈ C ( µ 1 , µ 2 ) set of all couplings Typical pattern Bound distance d # between two (output) distributions by constructing a coupling with small average distance d 13

  24. Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B 14

  25. Putting it together: Expected sensitivity Given: ◮ A function f : A → Distr ( B ) (think: probabilistic program) ◮ Distances d in and d out on A and B We say f is ( d in , d out ) -expected sensitive if: d # out ( f ( a 1 ) , f ( a 2 )) ≤ d in ( a 1 , a 2 ) for all inputs a 1 , a 2 ∈ A . 14

  26. Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive 15

  27. Benefits: Expressive If d out ( b 1 , b 2 ) > k for all distinct b 1 , b 2 : ( d in , d out ) -expected sensitive = ⇒ k -uniform sensitive If outputs are real-valued and d out ( b 1 , b 2 ) = k · | b 1 − b 2 | : ( d in , d out ) -expected sensitive = ⇒ k -mean sensitive 15

  28. Benefits: Easy to reason about 16

  29. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive 16

  30. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive 16

  31. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ 16

  32. Benefits: Easy to reason about f : A → Distr ( B ) is ( d A , d B ) -expected sensitive g : B → Distr ( C ) is ( d B , d C ) -expected sensitive ◦ f : A → Distr ( C ) is ( d A , d C ) -expected sensitive g ˜ Abstract away distributions ◮ Work in terms of distances on ground sets ◮ No need to work with complex distances over distributions 16

  33. How to verify this property? The program logic E pRHL 17

  34. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18

  35. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ 18

  36. A relational program logic E pRHL The pWhile imperative language c ::= x ← e | x ← d | if e then c else c | while e do c | skip | c ; c $ Judgments ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } ◮ Tagged program variables: x � 1 � , x � 2 � ◮ P and Q : boolean predicates over tagged variables ◮ d in and d out : real-valued expressions over tagged variables 18

  37. E pRHL judgments model e xpected sensitivity A judgment ⊢ { P ; d in } c 1 ∼ c 2 { Q ; d out } is valid if: for all input memories ( m 1 , m 2 ) satisfying pre-condition P , there exists a coupling of outputs ([ ] m 2 ) with [ c 1 ] ] m 1 , [ [ c 2 ] ◮ support satisfying post-condition Q ◮ E [ d out ] ≤ d in ( m 1 , m 2 ) 19

  38. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  39. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  40. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20

  41. One proof rule: Sequential composition ⊢ { P ; d A } c 1 ∼ c 2 { Q ; d B } ⊢ { Q ; d B } c ′ 1 ∼ c ′ 2 { R ; d C } ⊢ { P ; d A } c 1 ; c ′ 1 ∼ c 2 ; c ′ 2 { R ; d C } 20


More recommend