Computational social processes Lirong Xia Fall, 2016
Example: Crowdsourcing . . . . . . . . . . . . . . . . . . . . > > . . . . . . . . a b c c b a b > > a > b … Turker 1 Turker 2 Turker n 2
The Condorcet Jury theorem [Condorcet 1785] The Condorcet Jury theorem. Pr( | ) • Given = Pr( | ) – two alternatives { O , M }. – 0.5< p <1, = p> 0.5 • Suppose – each agent’s preferences is generated i.i.d., such that – w/p p , the same as the ground truth – w/p 1 - p , different from the ground truth • Then, as n →∞, the majority of agents’ preferences converges in probability to the ground truth 3
Today’s schedule • Parametric ranking models – Distance-based models • Mallows • Condorcet – Random utility models • Plackett-Luce • Decision making – MLE – Bayesian 4
Parametric ranking models • A statistical model has three parts – A parameter space: Θ – A sample space: S = Rankings( A ) n • A = the set of alternatives, n=#voters • assuming votes are i.i.d. – A set of probability distributions over S: {Pr θ ( s ) for each s ∈ Rankings( A ) and θ ∈ Θ} 5
Example • Condorcet’s model for two alternatives • Parameter space Θ={ , } • Sample space S = { , } n • Probability distributions, i.i.d. Pr( | ) = Pr( | ) = p> 0.5 6
Mallows’ model [Mallows-1957] • Fixed dispersion 𝜒 <1 • Parameter space – all full rankings over candidates • Sample space – i.i.d. generated full rankings • Probabilities: Pr W ( V ) ∝ 𝜒 Kendall ( V , W ) 7
Example: Mallows for Eric Kyle Stan • Probabilities: 𝑎 = 1 + 2𝜒 + 2𝜒 ( + 𝜒 ) > > > > ⁄ 1 𝑎 ⁄ 𝜒 𝑎 > > > > > > 𝜒 ( 𝑎 ⁄ ⁄ 𝜒 𝑎 Truth > > > > 𝜒 ( 𝑎 ⁄ 𝜒 ) 𝑎 8 ⁄
Condorcet’s model [Condorcet-1785, Young-1988, ES UAI-14, APX NIPS-14] • Fixed dispersion 𝜒 <1 • Parameter space – all binary relations over candidates • Sample space – i.i.d. generated binary relations • Probabilities: Pr W ( V ) ∝ 𝜒 Kendall ( V , W ) 9
Random utility model (RUM) [Thurstone 27] • Continuous parameters: Θ =( θ 1 ,…, θ m ) – m : number of alternatives – Each alternative is modeled by a utility distribution μ i – θ i : a vector that parameterizes μ i • An agent’s latent utility U i for alternative c i is generated independently according to μ i ( U i ) • Agents rank alternatives according to their perceived utilities – Pr ( c 2 ≻ c 1 ≻ c 3 | θ 1 , θ 2 , θ 3 ) = Pr U i ∼ μ i ( U 2 >U 1 >U 3 ) θ 2 θ 3 θ 1 U 3 U 1 U 2 10
Generating a preference-profile • Pr ( Data | θ 1 , θ 2 , θ 3 ) = ∏ V ∈ Data Pr( V | θ 1 , θ 2 , θ 3 ) Parameters θ 3 θ 2 θ 1 Agent n Agent 1 … P n = c 1 ≻ c 2 ≻ c 3 P 1 = c 2 ≻ c 1 ≻ c 3 11
Plackett-Luce model • μ i ’ s are Gumbel distributions – A.k.a. the Plackett-Luce (P-L) model [BM 60, Yellott 77] • Alternative parameterization λ 1 ,…, λ m λ 1 λ 2 λ m − 1 Pr( c 1 c 2 c m | λ 1 λ m ) = × × × λ 1 + + λ m λ 2 + + λ m λ m − 1 + λ m c 1 is the top choice in { c 1 ,…, c m } c 2 is the top choice in { c 2 ,…, c m } c m -1 is preferred to c m • Pros: – Computationally tractable • Analytical solution to the likelihood function – The only RUM that was known to be tractable • Widely applied in Economics [McFadden 74] , learning to rank [Liu 11], and analyzing elections [GM 06,07,08,09] • Cons: may not be the best model 12
Example > > > > 10× 4 1 10× 1 4 9 6 > > > > 10× 5 1 10× 1 5 5 1 4 9 5 Truth > > > > 10× 5 4 10× 4 5 6 5 13
RUM with normal distributions • μ i ’ s are normal distributions – Thurstone’s Case V [Thurstone 27] • Pros: – Intuitive – Flexible • Cons: believed to be computationally intractable – No analytical solution for the likelihood function Pr( P | Θ ) is known ∞ ∞ ∞ Pr( c 1 c m | Θ ) = µ m ( U m ) µ m − 1 ( U m − 1 ) µ 1 ( U 1 ) dU 1 ∫ ∫ ∫ dU m − 1 dU m U m U 2 −∞ … 14 U m : from - ∞ to ∞ U m- 1 : from U m to ∞ U 1 : from U 2 to ∞
Model selection • Compare RUMs with Normal distributions and PL for – log-likelihood: log Pr( D | Θ ) – predictive log-likelihood: E log Pr( D test | Θ ) – Akaike information criterion (AIC): 2 k -2 log Pr( D | Θ ) – Bayesian information criterion (BIC): k log n -2 log Pr( D | Θ ) • Tested on an election dataset – 9 alternatives, randomly chosen 50 voters LL Pred. LL AIC BIC Value(Normal) - Value(PL) 44.8(15.8) 87.4(30.5) -79.6(31.6) -50.5(31.6) Red: statistically significant with 95% confidence Project: model fitness for election data 15
Decision making 16
Maximum likelihood estimators (MLE) Model: M r “Ground truth” θ … V 1 V 2 V n • For any profile P= ( V 1 ,…, V n ) , – The likelihood of θ is L ( θ , P )=Pr θ ( P )= ∏ V ∈ P Pr θ ( V ) – The MLE mechanism MLE ( P ) = argmax θ L ( θ , P ) – Decision space = Parameter space 17
Bayesian approach • Given a profile P= ( V 1 ,…, V n ) , and a prior distribution 𝜌 over Θ • Step 1: calculate the posterior probability over Θ using Bayes’ rule – Pr( θ | P ) ∝ 𝜌 ( θ ) Pr θ ( P ) • Step 2: make a decision based on the posterior distribution – Maximum a posteriori (MAP) estimation – MAP ( P ) = argmax θ Pr( θ | P ) – Technically equivalent to MLE when 𝜌 is uniform 18
Example • Θ={ , } Pr( | ) • S = { , } n = Pr( | ) • Probability distributions: = 0.6 • Data P = {10@ + 8@ } • MLE – L(O)=Pr O (O) 6 Pr O (M) 4 = 0.6 10 0.4 8 – L(M)=Pr M (O) 6 Pr M (M) 4 = 0.4 10 0.6 8 – L(O)>L(M), O wins • MAP: prior O:0.2, M:0.8 – Pr(O|P) ∝ 0.2 L(O) = 0.2 × 0.6 10 0.4 8 – Pr(M|P) ∝ 0.8 L(M) = 0.8 × 0.4 10 0.6 8 19 – Pr(M|P)> Pr(O|P), M wins
Decision making under uncertainty • You have a biased coin: head w/p p Credit: Panos Ipeirotis & Roy Radner – You observe 10 heads, 4 tails – Do you think the next two tosses will be two heads in a row? • Bayesian • MLE-based approach – the ground truth is – there is an unknown captured by a belief distribution but fixed ground truth – Compute Pr( p |Data) – p = 10/14=0.714 assuming uniform prior – Pr(2heads| p= 0.714 ) – Compute Pr(2heads|Data)=0.485<0 =(0.714) 2 =0.51>0.5 .5 – Yes! – No! 20
Statistical decision theory • Given – statistical model: Θ, S, Pr θ ( s ) – decision space: D – loss function: L( θ , d ) ∈ℝ • Make a good decision based on data – decision function f : data ⟶ D – Bayesian expected lost: • EL B (data, d ) = E θ |data L( θ ,d ) – Frequentist expected lost: • EL F ( θ , f ) = E data| θ L( θ ,f ( data )) – Evaluated w.r.t. the objective ground truth 21
Top 250 movies Ø “Complex voter weighting system” • Claimed to be accurate Ø a “true Bayesian estimate” • Claimed to be fair 22
Different Voice • Q: “This is unfair ! ” – “That film / show has received awards, great reviews, commendations and deserves a much higher vote!” • IMDB: “… only votes cast by IMDb users are counted. We do not delete or alter individual votes” IMDb Votes/Ratings Top Frequently Asked Questions http://www.imdb.com/help/show_leaf?votestopfaq 23
Fairness of Bayesian estimators • Theorem: Strict Condorcet No Bayesian estimator satisfies strict Condorcet criterion • Theorem: Neutrality Neutral Bayesian estimators = Bayesian estimators of “neutral” models 24
Recommend
More recommend