Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 - PDF document

-1- Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1 Introduction 1 1. Introduction 1.1. Statistical modelling What is a statistical model? 1.2. Statistical modelling What is a statistical model? Mathematical model 12 ● 10 ● 8 ● ● y 6 ● 4 y=2+1.5x ● 2 ● 0 0 1 2 3 4 5 6 x Statistical model 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x

-2- 1.3. Statistical modelling 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x What is a statistical model? • stoichastic mathematical expression • low-dimensional summary • relates one or more dependent random variables to one or more independent variables 1.4. Statistical modelling A random variable is one whose values depend on a set of random events and are described by a probability distribution 1.5. Statistical modelling 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x What is a statistical model? • embodies a data generation process along with the distributional assumptions underlying this generation • incorporates uncertainty • response = model + error • incorporate error (uncertainty) 1.6. Statistical modelling 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x What is the purpose of statistical modelling?

-3- • describe relationships / effects • estimate effects • predict outcomes 1.7. Statistical models 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x How do we estimate model parameters? - Y ∼ β 0 + β 1 X What criterion do we use to assess best fit? • Depends on how we assume Y is distributed 1.8. Statistical models 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x If we assume Y is drawn from a normal (gaussian) distribution. . . • Ordinary Least Squares OLS 1.9. Estimation • parameters – location (mean) – spread (variance) - uncertainty

-4- 1.10. Estimation 1.10.1. Least squares Parameter estimates 6 8 10 12 14 Sum of squares µ =10 ● ● ● ● ● ● ● ● ● ● 6 8 10 12 14 Parameter estimates 1.11. Estimation 1.11.1. Least squares estimates • Minimize sum of the squared residuals • Solve simultaneous equations 3.0 = β 0 × 1 + β 1 × 0 + ε 1 Y X 2.5 = β 0 × 1 + β 1 × 1 + ε 1 3 0 6.0 = β 0 × 1 + β 1 × 2 + ε 2 2.5 1 5.5 = β 0 × 1 + β 1 × 3 + ε 3 6 2 5.5 3 9.0 = β 0 × 1 + β 1 × 4 + ε 4 9 4 8.6 = β 0 × 1 + β 1 × 5 + ε 5 8.6 5 12.0 = β 0 × 1 + β 1 × 6 + ε 6 12 6 1.12. Estimation 1.12.1. Least squares estimates • Minimize sum of the squared residuals • Solve simultaneous equations Provided data (and residuals) Gaussian

-5- 12 ● 10 ● ● 8 y 6 ● ● y=2+1.5x+ ε 4 ● ● 2 0 0 1 2 3 4 5 6 x 1.13. Gaussian distribution Probability density function µ = 25, σ 2 = 5 µ = 25, σ 2 = 2 µ = 10, σ 2 = 2 0 5 10 15 20 25 30 35 40 2 σ 2 π e − ( x − µ )2 f ( x | µ , σ 2 ) = 1 2 σ 2 √ 1.14. Linear model assumptions • Normality • Homogeneity of variance • Linearity • Independence

-6- Homogeneity of variance   σ 2 0 0 ··· . . σ 2   0 . ··· ε i ∼ N ( 0 , σ 2 )   y i = β 0 + β 1 × x i + ε i V = cov = . .   . . σ 2 � ��  . .  ··· Linearity Normality σ 2 0 ··· ··· Zero covariance (=independence) 1.15. Linear model assumptions Homogeneity of variance   σ 2 0 0 ··· . .  σ 2  0 . ··· ε i ∼ N ( 0 , σ 2 )   y i = β 0 + β 1 × x i + ε i V = cov =  . .  . . σ 2 � ��  . .  ··· Linearity Normality σ 2 0 ··· ··· Zero covariance (=independence) What do we do, if the data do not satisfy the assumptions?

-7- 1.16. Scale transformations Frequency Frequency 0 10 20 30 40 0.0 0.5 1.0 1.5 2.0 log 10 Leaf length (cm) Leaf length (cm) Logarithmic scale Linear scale 0 10 20 30 40 0.0 0.5 1.0 1.5 2.0 log 10 leaf length (cm) Leaf length (cm) 1.17. Linear model y i = β 0 + β 1 x i + ε i • model embodies data generation processes • pertains to: • effects (linear predictor) • distribution 1.18. Data types Type Example Distribution Range Measurements length, weight Gaussian real, −∞ ≤ x ≥ ∞ logNormal real, 0 < x ≥ ∞ Gamma real, 0 < x ≥ ∞ Counts Abundance Poisson discrete, 0 ≥ x ≤ ∞ Negative Binomial discrete, 0 ≥ x ≤ ∞ Binary Presence/Absence Binomial discrete, x = 0, 1 Proportions Ratio Binomial discrete, 0 ≥ x ≤ n Percentages Percent cover Binomial real, 0 ≤ x ≥ 1 Beta real, 0 ≤ x ≥ 1

-8- What about density? 1.19. Gamma zero-bound variables with large var. Probability density function µ = 15, σ 2 = 15 (a = 15, s = 1) µ = 15, σ 2 = 30 (a = 7.5, s = 2) µ = 15, σ 2 = 60 (a = 3.75, s = 4) 0 5 10 15 20 25 30 35 40 1 $f(x | s , a ) = Θ ( a − 1) e Θ {− ( x / s ) } $ ( s a Γ( a )) x a = shape , s = scale µ = as , σ 2 = as 2 1.20. Poisson distribution Count data

-9- Probability density function λ = 25 λ = 15 λ = 3 0 5 10 15 20 25 30 35 40 f ( x | λ ) = e − λ λ x x ! µ = σ 2 = λ = df θ ( dispersion ) = σ 2 µ = 1 1.21. Negative Binomial Count data

-10- Probability density function Probability density function µ = 15, ω = 7.5 ( θ = 0.133; σ 2 = 3 µ ) µ = 25, ω = Inf ( θ = 0 ) µ = 15, ω = 3 ( θ = 0.333; σ 2 = 6 µ ) µ = 15, ω = Inf ( θ = 0 ) µ = 15, ω = 1.667 ( θ = 0.6; σ 2 = 10 µ ) µ = 3, ω = Inf ( θ = 0 ) 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 µ x ω ω f ( x | µ , ω ) = Γ( x + ω ) Γ( ω ) x ! × ( µ + ω ) µ + ω θ ( dispersion ) = 1 /ω µ 2 ω = − θ = 0 ( when ω = ∞ ) µ − σ 2 1.22. Binomial distribution Proportions or Presence/absence ( n ) p x (1 − p ) n − x f ( x | n , p ) = p µ = np , σ 2 = np (1 − p ) for presence/absence n = 1 1.23. Beta Continuous between 0 and 1

-11- Probability density function µ = 0.5, σ 2 = 0.023 (a = 5, b = 5) µ = 0.167, σ 2 = 0.019 (a = 1, b = 5) µ = 0.833, σ 2 = 0.019 (a = 5, b = 1) µ = 0.5, σ 2 = 0.125 (a = 0.5, b = 0.5) 0.00 0.25 0.50 0.75 1.00 Γ( a + b ) Γ( a )Γ( b ) x a − 1 (1 − x ) b − 1 f ( x | a , b ) = a + b , σ 2 = a ab µ = ( a + b ) 2 .( a + b +1) • must consider zero-one inflated 1.24. Generalized linear models Y = β 0 + β 1 x 1 + ... + β p x p + e g ( µ ) = β 0 + β 1 x 1 + ... + β p x p �� Link function Systematic • Random component. Y ∼ Dist ( µ , ...) • Systematic component [ −∞ , ∞ ] • Link function ( g () ) g ( µ ) = β 0 + β 1 x 1 + ... + β p x p 1.25. Generalized linear models Linear model is just a special case Y = β 0 + β 1 x 1 + ... + β p x p + e g ( µ ) = β 0 + β 1 x 1 + ... + β p x p �� Link function Systematic • Random component. Y ∼ N ( µ , σ 2 ) • Systematic component [ −∞ , ∞ ] • Link function ( g () ) I ( µ ) = β 0 + β 1 x 1 + ... + β p x p

-12- 1.26. Generalized linear models Response variable Probability Canonical Link function Model name Distribution Continuous Gaussian identiy: Linear regression measurements µ Gamma verse: Gamma regression 1 /µ Counts Poisson log: Poisson regression / log-linear log( µ ) model Negative bino- log: Negative binomial regression mial log( µ ) Quasi-poisson log: Poisson regression log ( µ ) Binary,proportions Binomial logit: Logistic regression ( π ) log 1 − π probit: Probit regression ∫ α + β . X 1 ( − 1 2 Z 2 ) exp dZ √ 2 π −∞ complimentary: Logistic regression log ( − log (1 − π )) Quasi-binomial logit: Logistic regression ( π ) log 1 − π Percentages Beta logit: Beta regression ( ) π log 1 − π 1.27. OLS Parameter estimates 6 8 10 12 14 Sum of squares µ =10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 8 10 12 14 Parameter estimates

-13- 1.28. Maximum Likelihood 2 σ 2 π e − ( x − µ )2 f ( x | µ , σ 2 ) = 1 2 σ 2 √ 2 ln σ 2 − ∑ 2 ln L ( µ , σ 2 ) = − n 2 ln (2 π ) − n 1 i =1 ( x i − µ ) 2 2 σ 2 Maximum likelihood estimates: ∑ n x = 1 µ = ¯ ˆ i =1 x i n σ 2 = 1 ∑ n x ) 2 ˆ i =1 ( x i − ¯ n 1.29. Maximum Likelihood Parameter estimates 6 8 10 12 14 Log−likelihood µ =10 ● ● ● ● ● 6 6 6 8 8 8 10 10 10 12 12 12 14 14 14 Parameter estimates

Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 - PDF document

-1- Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1 Introduction 1 1. Introduction 1.1. Statistical modelling What is a statistical model? 1.2. Statistical modelling What is a statistical model?

Workshop 4: Statistical modelling intro Murray Logan 10 Mar 2019 Section 1 Introduction

EMA EFPIA workshop Break-out session no. 3 SOME STATISTICAL ISSUES OF MODELLING AND

Statistical modelling issues arising from PK/PD bridging in paediatrics The Trileptal Example

Graphical Models and Bayesian Networks Tutorial International Workshop on Statistical Modelling

Statistical analysis of EEG data Hierarchical modelling and multiple comparisons correction

Statistical Modelling Helen Ogden & Antony Overstall University of Southampton c 2019

Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent

How to make R, PostGIS and QGis cooperate for statistical modelling duties a case study on

Statistical and spectral properties of the modulation instability: experiments and modelling by

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo,

Statistical modelling of dust polarization as a CMB foreground Franois Boulanger Ecole

Traditional and Modern Approaches to Modelling with R: An Advanced Course Bill Venables, CSIRO,

How far can we forecast? Statistical tests of the predictive content Jrg Breitung and Malte

Statistical Modelling under Epistemic Data Imprecision Some Results on Estimating Multinomial

SME workshop: Statistical perspectives in regulatory clinical development programmes Session 2:

Modelling and Verification of Component Compatibility by Composition MOCA'04 Third Workshop on

A Brief Introduction to Statistical Mechanics Indrani A. Vasudeva Murthy Modelling, Simulation

Two-State Imprecise Markov Chains for Statistical Modelling of Two-State Non-Markovian Processes

Spatial and statistical modelling of Phenology is the study of the timing of recurring life

what I am after Statistical modelling and analysis do not from gR2002 respect

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaniche

Teaching Statistical Literacy: Ch 4 16 May 2019 Ch4: V1 Ch4: V1 2019 USCOTS Workshop 1 2019

A Notion of Suffjciency for Statistical Modelling of Interval Data T. Augustin, E. Endres,

Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 - PDF document

-1- Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1 Introduction 1 1. Introduction 1.1. Statistical modelling What is a statistical model? 1.2. Statistical modelling What is a statistical model?

Workshop 4: Statistical modelling intro Murray Logan 10 Mar 2019 Section 1 Introduction

EMA EFPIA workshop Break-out session no. 3 SOME STATISTICAL ISSUES OF MODELLING AND

Statistical modelling issues arising from PK/PD bridging in paediatrics The Trileptal Example

Graphical Models and Bayesian Networks Tutorial International Workshop on Statistical Modelling

Statistical analysis of EEG data Hierarchical modelling and multiple comparisons correction

Statistical Modelling Helen Ogden &amp; Antony Overstall University of Southampton c 2019

Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent

How to make R, PostGIS and QGis cooperate for statistical modelling duties a case study on

Statistical and spectral properties of the modulation instability: experiments and modelling by

Modelling the Size of Forest Trees Using Statistical Distributions Lauri Meht atalo

New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo,

Statistical modelling of dust polarization as a CMB foreground Franois Boulanger Ecole

Traditional and Modern Approaches to Modelling with R: An Advanced Course Bill Venables, CSIRO,

How far can we forecast? Statistical tests of the predictive content Jrg Breitung and Malte

Statistical Modelling under Epistemic Data Imprecision Some Results on Estimating Multinomial

SME workshop: Statistical perspectives in regulatory clinical development programmes Session 2:

Modelling and Verification of Component Compatibility by Composition MOCA'04 Third Workshop on

A Brief Introduction to Statistical Mechanics Indrani A. Vasudeva Murthy Modelling, Simulation

Two-State Imprecise Markov Chains for Statistical Modelling of Two-State Non-Markovian Processes

Spatial and statistical modelling of Phenology is the study of the timing of recurring life

what I am after Statistical modelling and analysis do not from gR2002 respect

Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaniche

Teaching Statistical Literacy: Ch 4 16 May 2019 Ch4: V1 Ch4: V1 2019 USCOTS Workshop 1 2019

A Notion of Suffjciency for Statistical Modelling of Interval Data T. Augustin, E. Endres,

Statistical Modelling Helen Ogden & Antony Overstall University of Southampton c 2019