randomness in competitions
play

Randomness in Competitions Eli Ben-Naim Complex Systems Group & - PowerPoint PPT Presentation

Randomness in Competitions Eli Ben-Naim Complex Systems Group & Center for Nonlinear Studies Los Alamos National Laboratory Sidney Redner and Federico Vazquez (Los Alamos & Boston University) Nicholas Hengartner (Los Alamos) Micha


  1. Randomness in Competitions Eli Ben-Naim Complex Systems Group & Center for Nonlinear Studies Los Alamos National Laboratory Sidney Redner and Federico Vazquez (Los Alamos & Boston University) Nicholas Hengartner (Los Alamos) Micha Ben-Naim (Los Alamos Middle School) Talk, papers available from: http://cnls.lanl.gov/~ebn

  2. Plan 1. Modeling competitions 2. Tournaments (post season, trees) 3. Leagues (regular season, complete graphs) 4. Championships (new algorithm, regular graphs) 5. Modeling social dynamics

  3. Motivation • Evolution: species compete, fitter wins • Society: people compete for social status • Economics: companies compete for market share • Arts, science, politics: awards, prizes, elections Competition is everywhere

  4. Why sports? • Sports competition results are: - Accurate - Widely available - Complete Sports as a laboratory for understanding competition

  5. Theme • Competitions are not perfectly predictable • Outcome of a single competition is stochastic • Winner of a series of competitions (league, tournament) is also subject to randomness Randomness is inherent

  6. I. Modeling competitions

  7. What is the most competitive sport? Soccer Baseball Hockey Basketball Football Can competitiveness be quantified? How can competitiveness be quantified?

  8. Parity of a sports league Major League Baseball American League • Teams ranked by win-loss record 2005 Season-end Standings • Win percentage x = Number of wins Number of games • Standard deviation in win-percentage � � x 2 � − � x � 2 σ = • Cumulative distribution = Fraction of In baseball teams with winning percentage < x 0 . 400 < x < 0 . 600 F ( x ) σ = 0 . 08

  9. Data • 300,000 Regular season games (all games ever played) • 5 Major sports leagues in United States & England sport league full name country years games soccer FA Football Association 1888-2005 43,350 baseball MLB Major League Baseball 1901-2005 163,720 hockey NHL National Hockey League 1917-2005 39,563 basketball NBA National Basketball Association 1946-2005 43,254 football NFL National Football League 1922-2004 11,770 source: http://www.shrpsports.com/ http://www.the-english-football-archive.com/

  10. Standard deviation in winning percentage 1 data σ 0.25 0.8 theory 0.20 0.210 0.15 0.6 0.150 0.10 F(x) 0.120 0.102 0.084 0.05 0.4 0 NFL NBA MLB FA NHL NBA NFL NHL MLB 0.2 • Baseball most competitive? • Football least competitive? 0 0 0.2 0.4 0.6 0.8 1 x Distribution of winning percentage clearly distinguishes sports Fort and Quirk, 1995

  11. “Everything should be made as simple as possible but not simpler” Freeman Dyson

  12. “Simple Physics”

  13. The competition model • Two, randomly selected, teams play • Outcome of game depends on team record - � q = 1 / 2 random Weaker team wins with probability q<1/2 − → q = 0 deterministic - Stronger team wins with probability p>1/2 p + q = 1 � ( i + 1 , j ) probability p ( i, j ) → i > j ( i, j + 1) probability 1 − p - When two equal teams play, winner picked randomly • Initially, all teams are equal (0 wins, 0 losses) • Teams play once per unit time � x � = 1 2

  14. Rate equation approach • Probability distribution functions g k = fraction of teams with k wins k − 1 ∞ � � G k = g j = fraction of teams with less than k wins H k = 1 − G k +1 = g j • Evolution of the probability distribution j = k +1 j =0 dt = (1 − q )( g k − 1 G k − 1 − g k G k ) + q ( g k − 1 H k − 1 − g k H k ) + 1 dg k g 2 k − 1 − g 2 � � k 2 better team wins worse team wins equal teams play • Closed equations for the cumulative distribution dG k G 2 k − 1 − G 2 � � = q ( G k − 1 − G k ) + (1 / 2 − q ) k dt Boundary Conditions Initial Conditions G k ( t = 0) = 1 G 0 = 0 G ∞ = 1 Nonlinear Difference-Differential Equations

  15. An exact solution • Stronger always wins (q=0) dG k = G k ( G k − G k − 1 ) dt • Transformation into a ratio P k G k = P k +1 • Nonlinear equations reduce to linear recursion dP k dt = P k − 1 • Exact solution 2! t 2 + · · · + 1 1 + t + 1 k ! t k G k = 2! t 2 + · · · + 1 + t + 1 1 ( k +1)! t k +1

  16. Long-time asymptotics • Long-time limit 1 0.8 G k → k + 1 t • Scaling form 0.6 F(x) � k � 0.4 t=10 G k → F t=20 t t=100 0.2 • Scaling function scaling theory 0 0 1 2 0.5 1.5 F ( x ) = x x Seek similarity solutions Use winning percentage as scaling variable

  17. Scaling analysis • Rate equation dG k G 2 k − 1 − G 2 � � = q ( G k − 1 − G k ) + (1 / 2 − q ) k dt • Treat number of wins as continuous G k +1 − G k → ∂ G ∂ k Inviscid Burgers equation ∂ G ∂ t + [ q + (1 − 2 q ) G ] ∂ G ∂ v ∂ t + v ∂ v ∂ k = 0 ∂ x = 0 • Stationary distribution of winning percentage x = k G k ( t ) → F ( x ) t • Scaling equation [( x − q ) − (1 − 2 q ) F ( x )] dF dx = 0

  18. Scaling solution • Stationary distribution of winning percentage F ( x )  0 0 < x < q 1   x − q  q < x < 1 − q F ( x ) = 1 − 2 q   1 1 − q < x.  x • Distribution of winning percentage is uniform 1 − q q f ( x )  0 0 < x < q   1  f ( x ) = F ′ ( x ) = q < x < 1 − q 1 1 − 2 q 2 q − 1   0 1 − q < x.  x • Variance in winning percentage 1 − q q σ = 1 / 2 − q � q = 1 / 2 perfect parity √ − → q = 0 maximum disparity 3

  19. Approach to scaling Numerical integration of the rate equations, q=1/4 1 0.5 League games Theory MLB 160 0.8 t=100 NFL 0.4 t=500 FA 40 t − 1 / 2 NHL 80 0.6 F(x) NBA 80 � 0.3 MLB NFL 16 t − 1 / 2 0.4 1 0.2 √ 4 3 0.2 0.1 0 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 x t • Winning percentage distribution approaches scaling solution • Correction to scaling is very large for realistic number of games • Large variance may be due to small number of games σ ( t ) = 1 / 2 − q Large! + f ( t ) √ 3 Variance inadequate to characterize competitiveness!

  20. The distribution of win percentage 1 0.8 0.6 F(x) 0.4 NFL NBA NHL MLB 0.2 0 0 0.2 0.4 0.6 0.8 1 x • Treat q as a fitting parameter, time=number of games • Allows to estimate q model for different leagues

  21. The upset frequency • Upset frequency as a measure of predictability q = Number of upsets Number of games • Addresses the variability in the number of games • Measure directly from game-by-game results - Ties: count as 1/2 of an upset (small effect) - Ignore games by teams with equal records - Ignore games by teams with no record

  22. The upset frequency 0.48 League q model q 0.46 FA 0.459 0.452 0.44 MLB 0.413 0.441 0.42 NHL 0.383 0.414 NBA 0.316 0.365 0.40 q NFL 0.309 0.364 0.38 q differentiates 0.36 FA MLB the different 0.34 NHL NBA sport leagues! 0.32 NFL 0.30 1900 1920 1940 1960 1980 2000 year Soccer, baseball most competitive Basketball, football least competitive

  23. Evolution with time 0.48 0.28 0.26 0.46 NFL 0.24 0.44 NBA NHL 0.22 0.42 MLB 0.20 FA 0.40 q � 0.18 0.38 0.16 0.36 FA 0.14 MLB 0.34 NHL 0.12 NBA 0.32 NFL 0.10 0.30 0.08 1900 1920 1940 1960 1980 2000 1900 1920 1940 1960 1980 2000 year year • Parity, predictability mirror each other σ = 1 / 2 − q √ 3 • Football, baseball increasing competitiveness • Soccer decreasing competitiveness (past 60 years) S.J. Gould, Full House, The spread of excellence from Pluto to Darwin, 1996

  24. I. Discussion • Model limitation: it does not incorporate - Game location: home field advantage - Game score - Upset frequency dependent on relative team strength - Unbalanced schedule • Model advantages: - Simple, involves only 1 parameter - Enables quantitative analysis

  25. 1. Conclusions • Parity characterized by variance in winning percentage - Parity measure requires standings data - Parity measure depends on season length • Predictability characterized by upset frequency - Predictability measure requires game results data - Predictability measure independent of season length • Two-team competition model allows quantitative modeling of sports competitions

  26. 2. Tournaments (post-season, trees)

  27. Single-elimination Tournaments Binary Tree Structure

  28. The competition model • Two teams play, loser is eliminated N → N/ 2 → N/ 4 → · · · → 1 • Teams have inherent strength (or fitness) x x 5 x 4 x 1 x 2 x 3 x strong weak • Outcome of game depends on team strength � probability 1 − q x 1 ( x 1 , x 2 ) → x 1 < x 2 probability q x 2

  29. Recursive approach • Number of teams N = 2 k = 1 , 2 , 4 , 8 , . . . • = Cumulative probability distribution G N ( x ) function for teams with fitness less than x to win an N-team tournament • Closed equations for the cumulative distribution G 2 N ( x ) = 2 p G N ( x ) + (1 − 2 p ) [ G N ( x )] 2 Nonlinear Recursion Equation

  30. Scaling properties 1 1. Scale of Winner 0.8 x ∗ ∼ N − ln 2 p/ ln 2 G N (x) 0.6 2. Scaling Function N=1 0.4 N=2 N=4 N=8 G N ( x ) → Ψ ( x/x ∗ ) 0.2 N=16 3. Algebraic Tail 0 0 0.2 0.4 0.6 0.8 1 x 1 − Ψ ( z ) ∼ z ln 2 p/ ln 2 q 1. Large tournaments produce strong winners 3. High probability for an upset

Recommend


More recommend