strategic classification with crowdsourcing
play

Strategic Classification with Crowdsourcing Yang Liu ( joint work - PowerPoint PPT Presentation

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016 Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic


  1. Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen) yangl@seas.harvard.edu Harvard University Nov. 2016

  2. Introduction Preliminary Our results Conclusion (Non-strategic) Classification Non-strategic classification f ∗ : R d → {− 1 , +1 } y i = f ∗ ( x i ) , • Observing a set of training data, to learn f n ˜ � f = argmin f ∈F l ( f ( x i ) , y i ) . i =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 2 / 11

  3. Introduction Preliminary Our results Conclusion Strategic classification When data comes from strategic data sources... • Outsource x i to get a label ˜ y i . • Crowdsourcing, survey, human reports etc. Such training data carries noise • Intrinsic : due to limited worker expertise. • Strategic : lack of incentives. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 3 / 11

  4. Introduction Preliminary Our results Conclusion Strategic classification When data comes from strategic data sources... • Outsource x i to get a label ˜ y i . • Crowdsourcing, survey, human reports etc. Such training data carries noise • Intrinsic : due to limited worker expertise. • Strategic : lack of incentives. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 3 / 11

  5. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  6. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  7. Introduction Preliminary Our results Conclusion Goal to achieve The leaner wants to learn a good, unbiased classifier • Workers’ observations come from a flipping error model p + , p − . • Workers are effort sensitive. • Elicit high quality data from workers. (better performance) Information elicitation without verification • Peer prediction: SCORE(˜ y i , ˜ y j ) • DG13, RF15, SAFP16, KS16... SCORE(˜ y i , ˜ y j ) • Exerting effort to have a high quality data is usually a good equilibria. Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 4 / 11

  8. Introduction Preliminary Our results Conclusion Our method Joint learning and information elicitation: • SCORE(˜ y i , ˜ y j ) ⇒ SCORE(˜ y i , Machine) • ”Machine Prediction” • How to obtain a good machine answer? SCORE(˜ y i , Machine) Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 5 / 11

  9. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  10. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  11. Introduction Preliminary Our results Conclusion Classification with flipping errors [Natarajan et al. 13] • Suppose workers are truthfully reporting, how to de-bias? l ( t , y ) := (1 − p − y ) l ( t , y ) − p y l ( t , − y ) ˜ , p + + p − < 1 . 1 − p + − p − • Why does it work? [un-biased in expectation] y [˜ l ( t , ˜ y )] = l ( t , y ) , ∀ t . E ˜ • Find ˜ via minimizing the empirical risk w.r.t. ˜ l ( t , y ): f ∗ ˜ l N l ( f ) := 1 ˜ = argmin f ˆ ˜ � f ∗ R ˜ l ( f ( x j ) , ˆ y j ) . ˜ l N j =1 Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 6 / 11

  12. Introduction Preliminary Our results Conclusion Our mechanism For each worker i : • Estimate flipping errors ˜ p i , + , ˜ p i , − based on { x j , ˜ y j } j � = i . • Train ˜ f ∗ l , − i using [Natarajan et al. 13] with data from j � = i . ˜ Agree? ˆ ˜ y i f ∗ l, − i ( x i ) ˜ Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 7 / 11

  13. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  14. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  15. Introduction Preliminary Our results Conclusion How to estimate error rate How do we estimate without ground-truth? P + [ p 2 i , + + (1 − p i , + ) 2 ] + P − [ p 2 i , − + (1 − p i , − ) 2 ] = Pr(mathcing) P + p i , + + P − (1 − p i , − ) = Fraction of -1 labels observed • Lemma: There is a unique pair of ˜ p i , + , ˜ p i , − s.t. ˜ p i , + + ˜ p i , − < 1 ⇒ Bayesian informative: ⇔ Pr( y i = s | ˜ y i = s ) > Prior( s ), s ∈ { + , −} Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 8 / 11

  16. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  17. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  18. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  19. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

  20. Introduction Preliminary Our results Conclusion Results Effort exertion is a BNE. Benefits of doing so? • Less redundant assignment: not all tasks are re-assigned ⇒ budget efficient. • Better incentive: Reporting symmetric uninformative signal & permutation signal is not an equilibrium. • More learning flavor: no requirement of knowing workers’ data distribution. • Better privacy preserving etc... Liu (Harvard) Mathematical Fundation for Crowdsourcing HCOMP16 Nov. 2016 9 / 11

Recommend


More recommend