bayesian bias mitigation for crowdsourcing
play

Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC - PowerPoint PPT Presentation

University of California, Berkeley Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC Berkeley with Michael I. Jordan 9th of May, 2012 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 1 The Problem of Bias in


  1. Contribution I: Bayesian Preference Model Bayesian Preference Model Labelers express accumulated, shared preferences. ◮ Parameter γ b models effect of preference b = 1 , . . . , K . ◮ m × K binary matrix Z models parameter sharing. ◮ If z l , b = 1, labeler l expresses preference b . ◮ Parameter β l accumulates preferences: � β l = z l , b γ b b ◮ Likelihood: � � p ( y i , l | β ⊤ p ( Y | X , Z , γ ) = l x i ) i : y i , l � =0 l Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

  2. Contribution I: Bayesian Preference Model Bayesian Preference Model Labelers express accumulated, shared preferences. ◮ Parameter γ b models effect of preference b = 1 , . . . , K . ◮ m × K binary matrix Z models parameter sharing. ◮ If z l , b = 1, labeler l expresses preference b . ◮ Parameter β l accumulates preferences: � β l = z l , b γ b b ◮ Likelihood: � � p ( y i , l | β ⊤ p ( Y | X , Z , γ ) = l x i ) i : y i , l � =0 l ◮ Similar preferences ⇒ similar labelling. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 9

  3. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  4. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  5. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) (2) m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  6. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  7. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  8. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  9. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  10. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  11. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  12. Contribution I: Bayesian Preference Model Priors ◮ Prior on γ b : p ( γ b ) = N (0 , σ 2 I ) for each b . ◮ Prior on Z : fix Z to be m × K . � α � π b | α ∼ Beta K , 1 , b = 1 , . . . , K (1) z l , b | π b ∼ Bern ( π b ) , l = 1 , . . . , m (2) K m Z ◮ As K → ∞ , distribution over Z converges to the Indian Buffet Process (IBP) . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 10

  13. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  14. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  15. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. ◮ Want to predict labeller l ’s labels. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  16. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. ◮ Want to predict labeller l ’s labels. ◮ Labeller l could be in the crowd, or the gold standard . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  17. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. ◮ Want to predict labeller l ’s labels. ◮ Labeller l could be in the crowd, or the gold standard . ◮ Required inference: p ( β l | X , Y ), or equivalently p ( z l , b , γ b , b = 1 , . . . , K | X , Y ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  18. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. ◮ Want to predict labeller l ’s labels. ◮ Labeller l could be in the crowd, or the gold standard . ◮ Required inference: p ( β l | X , Y ), or equivalently p ( z l , b , γ b , b = 1 , . . . , K | X , Y ). ◮ Model is complex. Exact inference intractable. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  19. Contribution I: Bayesian Preference Model Complete model p ( Y , Z , γ | X ) = p ( Y | X , Z , γ ) p ( γ | Z ) p ( Z ) ◮ Recall bias: different labellers can have different β ’s Example: Disagreement if guitar is behind/next to the couch. ◮ Want to predict labeller l ’s labels. ◮ Labeller l could be in the crowd, or the gold standard . ◮ Required inference: p ( β l | X , Y ), or equivalently p ( z l , b , γ b , b = 1 , . . . , K | X , Y ). ◮ Model is complex. Exact inference intractable. ◮ Possible alternatives: Gibbs sampling , variational inference , slice sampling , etc. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 11

  20. BBMC Results Overview Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 12

  21. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  22. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  23. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). ◮ γ b Gaussian b = 1 , 2. β l = � b z l , b γ b . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  24. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). ◮ γ b Gaussian b = 1 , 2. β l = � b z l , b γ b . ◮ Observation probability ǫ = 0 . 1.  0 w.p. (1 − ǫ )  w.p. ǫ Φ( x ⊤ y i , l = +1 i β l ) − 1 o.w.  Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  25. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). ◮ γ b Gaussian b = 1 , 2. β l = � b z l , b γ b . ◮ Observation probability ǫ = 0 . 1.  0 w.p. (1 − ǫ )  w.p. ǫ Φ( x ⊤ y i , l = +1 i β l ) − 1 o.w.  ◮ Inference: want to recover β 1 (say). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  26. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). ◮ γ b Gaussian b = 1 , 2. β l = � b z l , b γ b . ◮ Observation probability ǫ = 0 . 1.  0 w.p. (1 − ǫ )  w.p. ǫ Φ( x ⊤ y i , l = +1 i β l ) − 1 o.w.  ◮ Inference: want to recover β 1 (say). ◮ Requires p ( z 1 , b , γ b , b = 1 . . . , K | , X , Y ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  27. BBMC Results Results: Synthetic Data ◮ X is 2000 × 4 Gaussian matrix ◮ Z is 30 × 2 uniform binary matrix ( m = 30 , K = 2). ◮ γ b Gaussian b = 1 , 2. β l = � b z l , b γ b . ◮ Observation probability ǫ = 0 . 1.  0 w.p. (1 − ǫ )  w.p. ǫ Φ( x ⊤ y i , l = +1 i β l ) − 1 o.w.  ◮ Inference: want to recover β 1 (say). ◮ Requires p ( z 1 , b , γ b , b = 1 . . . , K | , X , Y ). ◮ For inference set K = 10. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 13

  28. BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

  29. BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps. ◮ Gibbs sequence for γ 1 , 1 . 0.75 0.7 0.65 0.6 0.55 1000 1200 1400 1600 1800 2000 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

  30. BBMC Results ◮ Latent Z mostly correct after 1000 Gibbs steps. ◮ Gibbs sequence for γ 1 , 1 . 0.75 0.7 0.65 0.6 0.55 1000 1200 1400 1600 1800 2000 ◮ True and posterior mean of β 1 after 1000 iterations burnin.     0 . 6915 0 . 6514 0 . 0754 0 . 0535 ˆ     β 1 = β 1 = (3)     − 0 . 6815 − 0 . 6473     0 . 6988 0 . 6957 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 14

  31. BBMC Results Results: Crowdsourced data ◮ Task: Is the triangle to the left or above the rectangle Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

  32. BBMC Results Results: Crowdsourced data ◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per task, 76 labellers. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

  33. BBMC Results Results: Crowdsourced data ◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per task, 76 labellers. ◮ Want to predict gold standard: compare centroid positions. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

  34. BBMC Results Results: Crowdsourced data ◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per task, 76 labellers. ◮ Want to predict gold standard: compare centroid positions. ◮ All 26 labellers with over 20 labels have error above 0.16. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

  35. BBMC Results Results: Crowdsourced data ◮ Task: Is the triangle to the left or above the rectangle ◮ Labelled on Amazon Mechanical Turk: 523 tasks, 3 labels per task, 76 labellers. ◮ Want to predict gold standard: compare centroid positions. ◮ All 26 labellers with over 20 labels have error above 0.16. ◮ Researcher also labels, and gives 60 gold standard labels. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 15

  36. BBMC Results ◮ Averaged log likelihood and error rate on test set. ◮ Our model: BBMC. Algorithm Final Loglik Final Error − 3716 ± 1695 0 . 0547 ± 0 . 0102 GOLD No Active − 421 . 1 ± 2 . 6 0 . 0935 ± 0 . 0031 CONS Learning − 219 . 1 ± 3 . 1 0 . 0309 ± 0 . 0033 BBMC Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 16

  37. Contribution II: Approximate Active Learning Overview Contribution I: Bayesian Preference Model BBMC Results Contribution II: Approximate Active Learning Active Learning Results Conclusion Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 17

  38. Contribution II: Approximate Active Learning Active Learning ◮ Want to predict labeller l ’s labels. Need β l . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

  39. Contribution II: Approximate Active Learning Active Learning ◮ Want to predict labeller l ’s labels. Need β l . ◮ Not all labellers are useful to infer β l . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

  40. Contribution II: Approximate Active Learning Active Learning ◮ Want to predict labeller l ’s labels. Need β l . ◮ Not all labellers are useful to infer β l . ◮ If l and l ′ share parameters ⇒ can learn about β l from l ′ . � � β l = z l , b γ b β l ′ = z l ′ , b γ b (4) b b Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

  41. Contribution II: Approximate Active Learning Active Learning ◮ Want to predict labeller l ’s labels. Need β l . ◮ Not all labellers are useful to infer β l . ◮ If l and l ′ share parameters ⇒ can learn about β l from l ′ . � � β l = z l , b γ b β l ′ = z l ′ , b γ b (4) b b ◮ Active learning: repeatedly select training data that helps learning β l . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

  42. Contribution II: Approximate Active Learning Active Learning ◮ Want to predict labeller l ’s labels. Need β l . ◮ Not all labellers are useful to infer β l . ◮ If l and l ′ share parameters ⇒ can learn about β l from l ′ . � � β l = z l , b γ b β l ′ = z l ′ , b γ b (4) b b ◮ Active learning: repeatedly select training data that helps learning β l . ◮ Goal: cheaper training data, faster learning. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 18

  43. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  44. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  45. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U ◮ Examples: U ( · ) = − Entropy( · ). U µ ( · ) = | | 2 | Mean( · ) − µ | 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  46. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U ◮ Examples: U ( · ) = − Entropy( · ). U µ ( · ) = | | 2 | Mean( · ) − µ | 2 ◮ For each ( i ′ , l ′ ) score, need posterior p ( β | y i ′ , l ′ , X , Y ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  47. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U ◮ Examples: U ( · ) = − Entropy( · ). U µ ( · ) = | | 2 | Mean( · ) − µ | 2 ◮ For each ( i ′ , l ′ ) score, need posterior p ( β | y i ′ , l ′ , X , Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score ( i ′ , l ′ ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  48. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U ◮ Examples: U ( · ) = − Entropy( · ). U µ ( · ) = | | 2 | Mean( · ) − µ | 2 ◮ For each ( i ′ , l ′ ) score, need posterior p ( β | y i ′ , l ′ , X , Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score ( i ′ , l ′ ). ◮ We are already running one Gibbs sampler for basic inference. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  49. Contribution II: Approximate Active Learning Approximate inference and Active Learning ◮ Suppose we start with training data Y . ◮ Query task-labeler pair ( i , l ) to maximize expected utility of adding it � � p ( β | y i ′ , l ′ , X , Y ) �� ( i , l ) = argmax ( i ′ , l ′ ) E y i ′ , l ′ U ◮ Examples: U ( · ) = − Entropy( · ). U µ ( · ) = | | 2 | Mean( · ) − µ | 2 ◮ For each ( i ′ , l ′ ) score, need posterior p ( β | y i ′ , l ′ , X , Y ). ◮ Gibbs sampling ⇒ separate Gibbs samplers to score ( i ′ , l ′ ). ◮ We are already running one Gibbs sampler for basic inference. ◮ Problem: Can we avoid running the extra scoring chains? Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 19

  50. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  51. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  52. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference β t − 1 β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  53. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring β t − 1 β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  54. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring β t − 1 β t β t + 1 β t + 2 ˆ ˆ β t − 1 ˆ ˆ β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  55. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: β t − 1 β t β t + 1 β t + 2 ˆ ˆ β t − 1 ˆ ˆ β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  56. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: • Run perturbed chain; sample from stationary distribution. • Compute U ( p ( β | y i ′ , l ′ , X , Y )). β t − 1 β t β t + 1 β t + 2 ˆ ˆ β t − 1 ˆ ˆ β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  57. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: • Run perturbed chain; sample from stationary distribution. • Compute U ( p ( β | y i ′ , l ′ , X , Y )). ◮ Our method : β t − 1 β t β t + 1 β t + 2 ˆ ˆ β t − 1 ˆ ˆ β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  58. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: • Run perturbed chain; sample from stationary distribution. • Compute U ( p ( β | y i ′ , l ′ , X , Y )). ◮ Our method : • Get approximate samples of p ( β | y i ′ , l ′ , X , Y ) by transforming samples of p ( β | X , Y ). β t − 1 β t β t + 1 β t + 2 ˆ ˆ β t − 1 ˆ ˆ β t β t + 1 β t + 2 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  59. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: • Run perturbed chain; sample from stationary distribution. • Compute U ( p ( β | y i ′ , l ′ , X , Y )). ◮ Our method : • Get approximate samples of p ( β | y i ′ , l ′ , X , Y ) by transforming samples of p ( β | X , Y ). β t − 1 β t β t + 1 β t + 2 ˆ ˆ ˆ ˆ β t β t + 1 β t + 2 β t + 3 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  60. Contribution II: Approximate Active Learning Contribution II: Approximate Active Learning ◮ Gibbs sampler for p ( β | X , Y ) is a Markov chain for inference ◮ Sampler for p ( β | y i ′ , l ′ , X , Y ) is a perturbed chain for scoring ◮ Na¨ ıve scoring: • Run perturbed chain; sample from stationary distribution. • Compute U ( p ( β | y i ′ , l ′ , X , Y )). ◮ Our method : • Get approximate samples of p ( β | y i ′ , l ′ , X , Y ) by transforming samples of p ( β | X , Y ). • Approximate U ( p ( β | y i ′ , l ′ , X , Y )) from these. β t − 1 β t β t + 1 β t + 2 ˆ ˆ ˆ ˆ β t β t + 1 β t + 2 β t + 3 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 20

  61. Contribution II: Approximate Active Learning Approximate Scoring for Active Learning p (ˆ β t | ˆ ◮ Suppose chain p ( β t | β t − 1 ) and a perturbed chain ˆ β t − 1 ). p ∞ (ˆ ◮ Stationary distributions are p ∞ ( β ) and ˆ β ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

  62. Contribution II: Approximate Active Learning Approximate Scoring for Active Learning p (ˆ β t | ˆ ◮ Suppose chain p ( β t | β t − 1 ) and a perturbed chain ˆ β t − 1 ). p ∞ (ˆ ◮ Stationary distributions are p ∞ ( β ) and ˆ β ). ◮ Let β s ∼ p ∞ ( β ) s = 1 , . . . , S , and approximate S β | β ) p ∞ ( β ) d β ≈ 1 � p ∞ (ˆ p (ˆ � p (ˆ β | β s ) . β ) ≈ ˆ ˆ ˆ S s =1 Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

  63. Contribution II: Approximate Active Learning Approximate Scoring for Active Learning p (ˆ β t | ˆ ◮ Suppose chain p ( β t | β t − 1 ) and a perturbed chain ˆ β t − 1 ). p ∞ (ˆ ◮ Stationary distributions are p ∞ ( β ) and ˆ β ). ◮ Let β s ∼ p ∞ ( β ) s = 1 , . . . , S , and approximate S β | β ) p ∞ ( β ) d β ≈ 1 � p ∞ (ˆ p (ˆ � p (ˆ β | β s ) . β ) ≈ ˆ ˆ ˆ S s =1 ◮ If p ∞ ( β ) = ˆ p ∞ ( β ), the first approximation is exact. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

  64. Contribution II: Approximate Active Learning Approximate Scoring for Active Learning p (ˆ β t | ˆ ◮ Suppose chain p ( β t | β t − 1 ) and a perturbed chain ˆ β t − 1 ). p ∞ (ˆ ◮ Stationary distributions are p ∞ ( β ) and ˆ β ). ◮ Let β s ∼ p ∞ ( β ) s = 1 , . . . , S , and approximate S β | β ) p ∞ ( β ) d β ≈ 1 � p ∞ (ˆ p (ˆ � p (ˆ β | β s ) . β ) ≈ ˆ ˆ ˆ S s =1 ◮ If p ∞ ( β ) = ˆ p ∞ ( β ), the first approximation is exact. ◮ Specialize to active learning: • Unperturbed chain = Gibbs sampler for p ( β | X , Y ). • Perturbed chain = Gibbs sampler for p ( β | y i ′ , l ′ , X , Y ). Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 21

  65. Contribution II: Approximate Active Learning Special Case: Discrete Random Walks ◮ Suppose W is n × n , positive, symmetric. P = D − 1 W . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

  66. Contribution II: Approximate Active Learning Special Case: Discrete Random Walks ◮ Suppose W is n × n , positive, symmetric. P = D − 1 W . ◮ Stationary distribution is left eigenvector of P . Decompose A = D − 1 / 2 WD − 1 / 2 (5) = V Λ V ⊤ , λ 1 ≤ λ 2 ≤ . . . ≤ λ n = 1 (6) p ∞ ∝ D 1 / 2 v n (7) Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

  67. Contribution II: Approximate Active Learning Special Case: Discrete Random Walks ◮ Suppose W is n × n , positive, symmetric. P = D − 1 W . ◮ Stationary distribution is left eigenvector of P . Decompose A = D − 1 / 2 WD − 1 / 2 (5) = V Λ V ⊤ , λ 1 ≤ λ 2 ≤ . . . ≤ λ n = 1 (6) p ∞ ∝ D 1 / 2 v n (7) ◮ Perturb the matrix: ˆ W = W + dW ≥ 0, with dW 1 = 0. Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

  68. Contribution II: Approximate Active Learning Special Case: Discrete Random Walks ◮ Suppose W is n × n , positive, symmetric. P = D − 1 W . ◮ Stationary distribution is left eigenvector of P . Decompose A = D − 1 / 2 WD − 1 / 2 (5) = V Λ V ⊤ , λ 1 ≤ λ 2 ≤ . . . ≤ λ n = 1 (6) p ∞ ∝ D 1 / 2 v n (7) ◮ Perturb the matrix: ˆ W = W + dW ≥ 0, with dW 1 = 0. P = D − 1 ˆ ◮ Then ˆ W = P + D − 1 dW = P + dP . Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 22

  69. Contribution II: Approximate Active Learning Special Case: Discrete Random Walks ◮ Matrix perturbation theory:   v k v ⊤ � p ∞ ≈ p ∞ + D 1 / 2 k  dP ⊤ D − 1 / 2 p ∞ ˜ (8) 1 − λ k k � = n Fabian L. Wauthier: Bayesian Bias Mitigation for Crowdsourcing, 23

Recommend


More recommend