on the dependency of soccer scores a sparse bivariate
play

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson - PowerPoint PPT Presentation

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll & A. Mayr & T. Kneib & G. Schauberger Department of Statistics, Georg-August-University Gttingen MathSport International


  1. On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll ∗ & A. Mayr & T. Kneib & G. Schauberger ∗ Department of Statistics, Georg-August-University Göttingen MathSport International 2017 Conference, Padua, June 28th Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 1 / 22

  2. Who will celebrate? Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 2 / 22

  3. Who will cry? Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 3 / 22

  4. Theoretical Background Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 4 / 22

  5. Aims The main aims are to ● find an explicit model for exact numbers of goals ● include covariates ● adjust for possible correlations between numbers of goals of both competing teams. Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 5 / 22

  6. Aims The main aims are to ● find an explicit model for exact numbers of goals ● include covariates ● adjust for possible correlations between numbers of goals of both competing teams. ⇒ Different approaches for ● EURO 2012 (Groll and Abedieh, 2013) ● World Cup 2014 (Groll, Schauberger and Tutz, 2015) ● EURO 2016 Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 5 / 22

  7. Univariate Model for International Soccer Tournaments y ijk ∣ x ik , x jk ∼ Po ( λ ijk ) i , j ∈ { 1 ,..., n } , i ≠ j log ( λ ijk ) = β 0 + ξ ik − δ jk n : Number of teams y ijk : Number of goals scored by team i against opponent j at tournament k x ik , x jk : Covariate vectors of team i and opponent j varying over tournaments e.g. EURO 2012 (Groll and Abedieh, 2013): ξ ik = x T β ξ + b i β ik β δ jk = x T β δ + b j β jk β Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 6 / 22

  8. Univariate Model for International Soccer Tournaments y ijk ∣ x ik , x jk ∼ Po ( λ ijk ) i , j ∈ { 1 ,..., n } , i ≠ j log ( λ ijk ) = β 0 + ξ ik − δ jk n : Number of teams y ijk : Number of goals scored by team i against opponent j at tournament k x ik , x jk : Covariate vectors of team i and opponent j varying over tournaments e.g. World Cup 2014 (Groll, Schauberger and Tutz, 2015): ξ ik = x T β + att i β ik β δ jk = x T β + def j β jk β Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 6 / 22

  9. Univariate Model for International Soccer Tournaments y ijk ∣ x ik , x jk ∼ Po ( λ ijk ) i , j ∈ { 1 ,..., n } , i ≠ j log ( λ ijk ) = β 0 + ξ ik − δ jk n : Number of teams y ijk : Number of goals scored by team i against opponent j at tournament k x ik , x jk : Covariate vectors of team i and opponent j varying over tournaments e.g. World Cup 2014 (Groll, Schauberger and Tutz, 2015): ξ ik = x T β + att i β ik β δ jk = x T β + def j β jk β ⇒ log ( λ ijk ) = β 0 + ( x ik − x jk ) T β β + att i − def j β Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 6 / 22

  10. Correlation between Scores of Both Teams Dixon and Coles (1997) compared marginal distributions of scores with joint distribution ⇒ correlation! Source: Dixon and Coles (1997) ⇒ Introduction of additional dependence parameter But: They did not compare conditional distributions! ⇒ Their linear predictors are not independent! Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 7 / 22

  11. The Bivariate Poisson Distribution ∼ Po ( λ k ) , k = 1 , 2 , 3 , λ k > 0 ind . X k ⇒ Y 1 = X 1 + X 3 and Y 2 = X 2 + X 3 follow a joint bivariate Poisson distribution ( Y 1 , Y 2 ) ∼ Po 2 ( λ 1 ,λ 2 ,λ 3 ) Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 8 / 22

  12. The Bivariate Poisson Distribution ∼ Po ( λ k ) , k = 1 , 2 , 3 , λ k > 0 ind . X k ⇒ Y 1 = X 1 + X 3 and Y 2 = X 2 + X 3 follow a joint bivariate Poisson distribution ( Y 1 , Y 2 ) ∼ Po 2 ( λ 1 ,λ 2 ,λ 3 ) Probability function: P Y 1 , Y 2 ( y 1 , y 2 ) = P ( Y 1 = y 1 , Y 2 = y 2 ) exp (−( λ 1 + λ 2 + λ 3 )) λ y 1 λ y 2 min ( y 1 , y 2 ) k = ( y 1 k )( y 2 k ) k ! ( λ 3 ) ∑ 1 2 y 1 ! y 2 ! λ 1 λ 2 k = 0 Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 8 / 22

  13. The Bivariate Poisson Distribution ∼ Po ( λ k ) , k = 1 , 2 , 3 , λ k > 0 ind . X k ⇒ Y 1 = X 1 + X 3 and Y 2 = X 2 + X 3 follow a joint bivariate Poisson distribution ( Y 1 , Y 2 ) ∼ Po 2 ( λ 1 ,λ 2 ,λ 3 ) Probability function: P Y 1 , Y 2 ( y 1 , y 2 ) = P ( Y 1 = y 1 , Y 2 = y 2 ) exp (−( λ 1 + λ 2 + λ 3 )) λ y 1 λ y 2 min ( y 1 , y 2 ) k = ( y 1 k )( y 2 k ) k ! ( λ 3 ) ∑ 1 2 y 1 ! y 2 ! λ 1 λ 2 k = 0 ● E ( Y 1 ) = λ 1 + λ 3 ● E ( Y 2 ) = λ 2 + λ 3 ● cov ( Y 1 , Y 2 ) = λ 3 Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 8 / 22

  14. The Bivariate Poisson Distribution 0.06 0.08 0.06 0.04 0.04 0.02 0.02 0.00 6 6 0 0 2 4 2 4 4 4 2 2 6 6 0 0 ● λ 1 = 2 ● λ 1 = 1 ● λ 2 = 2 ● λ 2 = 1 ● λ 3 = 0 ● λ 3 = 1 Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 9 / 22

  15. Re-parametrization of bivariate Poisson distribution Replace λ 1 = γ 1 γ 2 and λ 2 = γ 1 γ 2 : P Y 1 , Y 2 ( y 1 , y 2 ) = P ( Y 1 = y 1 , Y 2 = y 2 ) ( γ 1 γ 2 ) y 2 2 ) + λ 3 ))( γ 1 γ 2 ) y 1 k min ( y 1 , y 2 ) = exp (−( γ 1 ( γ 2 + γ − 1 ( y 1 )( y 2 ) k ! ( λ 3 ) ∑ k k γ 2 y 1 ! y 2 ! k = 0 1 γ 1 = exp ( β 0 ) γ 2 = exp ( ˜ β ) x T β β λ 3 = exp ( α 0 + ∣ ˜ x ∣ T α α ) α x = x 1 − x 2 . with ˜ Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 10 / 22

  16. Re-parametrization of bivariate Poisson distribution Replace λ 1 = γ 1 γ 2 and λ 2 = γ 1 γ 2 : P Y 1 , Y 2 ( y 1 , y 2 ) = P ( Y 1 = y 1 , Y 2 = y 2 ) ( γ 1 γ 2 ) y 2 2 ) + λ 3 ))( γ 1 γ 2 ) y 1 k min ( y 1 , y 2 ) = exp (−( γ 1 ( γ 2 + γ − 1 ( y 1 )( y 2 ) k ! ( λ 3 ) ∑ k k γ 2 y 1 ! y 2 ! k = 0 1 γ 1 = exp ( β 0 ) ⇒ λ 1 = exp ( β 0 + ˜ β ) x T β β γ 2 = exp ( ˜ β ) ⇒ λ 2 = exp ( β 0 − ˜ β ) x T β x T β β β λ 3 = exp ( α 0 + ∣ ˜ x ∣ T α α ) α x = x 1 − x 2 . with ˜ Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 10 / 22

  17. Bivariate Poisson Model for Football Results ( y ik , y jk )∣ x ik , x jk ∼ Po 2 ( γ 1 ,γ ijk 2 ,λ ijk 3 ) ● γ 1 = exp ( β 0 ) ● γ ijk 2 = exp (( x ik − x jk ) T β β ) β ● λ ijk 3 = exp ( α 0 + ∣ x ik − x jk ∣ T α α ) α Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 11 / 22

  18. Bivariate Poisson Model for Football Results ( y ik , y jk )∣ x ik , x jk ∼ Po 2 ( γ 1 ,γ ijk 2 ,λ ijk 3 ) ● γ 1 = exp ( β 0 ) ● γ ijk 2 = exp (( x ik − x jk ) T β β ) β ● λ ijk 3 = exp ( α 0 + ∣ x ik − x jk ∣ T α α ) α � ⇒ Framework of the so-called Generalized Additive Model for Location, Scale and Shape (GAMLSS; Rigby and Stasinopoulos, 2005) Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 11 / 22

  19. Boosting for GAMLSS ● R -package gamboostLSS (Hofner, Mayr and Schmid, 2015) ● Allows for variable selection within GAMLSS framework ● Provides a large number of pre-specified distributions – Negative binomial distribution – Zero-inflated Poisson distribution – ... Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 12 / 22

  20. Boosting for GAMLSS ● R -package gamboostLSS (Hofner, Mayr and Schmid, 2015) ● Allows for variable selection within GAMLSS framework ● Provides a large number of pre-specified distributions – Negative binomial distribution – Zero-inflated Poisson distribution – ... ● Mostly restricted to univariate responses, first approach for bivariate normal distribution from Andreas Mayr ● Users can specify new distributions (also bivariate) by providing – loss/risk function → neg. log-likelihood – neg. gradient of loss function → score function – possibly suitable offsets for linear predictors Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 12 / 22

  21. Boosting for GAMLSS ● R -package gamboostLSS (Hofner, Mayr and Schmid, 2015) ● Allows for variable selection within GAMLSS framework ● Provides a large number of pre-specified distributions – Negative binomial distribution – Zero-inflated Poisson distribution – ... ● Mostly restricted to univariate responses, first approach for bivariate normal distribution from Andreas Mayr ● Users can specify new distributions (also bivariate) by providing – loss/risk function → neg. log-likelihood – neg. gradient of loss function → score function – possibly suitable offsets for linear predictors ⇒ We implemented bivariate Poisson distribution Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 12 / 22

  22. Application to UEFA Europoean Championship 2016 Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 13 / 22

  23. Covariates ● Economic Factors: – GDP per capita – population Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 14 / 22

  24. Covariates ● Economic Factors: – GDP per capita – population ● Sportive Factors: – Home advantage – ODDSET odds – market value – FIFA rank – UEFA points Groll et al. (MathSport 2017) A Sparse Model for the EURO 2016 14 / 22

Recommend


More recommend