Advanced Mathematical Methods Part II – Statistics GLM – Analysis of Variance Designs Mel Slater http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/ 1
Outline � Factorial Design � One Way Analysis of Variance � Two Way Analysis of Variance � Two Way Analysis of Variance with replications � Mixed Models – Analysis of Covariance 2
Factorial Design � In regression analysis we assume that the ‘independent variables’ are numerical variables. � Often this is not the case – • for example, in a virtual reality experiment a possible impact on the response may be ‘status’ of the person (undergraduate student, PhD student, staff, administrator, etc) � Such ‘qualitative’ variables are called ‘factors’. � An experiment that is designed where the independent variables are ‘factors’ is called a Factorial Design. � These are often called Analysis of Variance (ANOVA) models. 3
One Way ANOVA � This is the simplest design � There is one factor, and the factor has k levels. • Gender, k = 2 (M,F) • Education, k=5 (None,‘A level’, BSc/BA, Masters, PhD) • Anxiety level, k=3 (low, moderate, high) • etc 4
One Way ANOVA Model � The way to express this situation is • E(y ij ) = µ + α i – i = 1,2,…,k (number of levels of the factor) – j = 1,2,…,n (number of observations at each level) • µ is the ‘grand mean’ • α I is the effect of being at the ith level • y ij ~ independent normal distribution r.v.s with constant variance σ 2 . 5
1 Way ANOVA Analysis � We can therefore use the GLM to find the LS estimates µ = * y α = − * y y i i � We can also construct an analysis of variance table 6
µ ⎛ ⎞ ⎜ ⎟ 1-Way ANOVA and GLM α ⎜ ⎟ 1 ⎜ ⎟ ... ⎜ ⎟ α ⎜ ⎟ � This can easily be reformulated as a 1 ⎜ ⎟ special case of the general linear model, ⎜ ⎟ with: ⎜ ⎟ α � X is a matrix that consists entirely of 0s β = and 1s ⎜ 2 ⎟ ... ⎜ ⎟ � (What is it?) ⎜ ⎟ � Note that the X matrix is not of full rank α ⎜ ⎟ and therefore (XTX) is singular. 2 ⎜ ⎟ ... � An additional constraint must be put on the α i ⎜ ⎟ α ⎜ ⎟ • Their sum = 0, OR k • α 1 = 0 (GLIM convention) ⎜ ⎟ ... ⎜ ⎟ α ⎝ ⎠ k 7
Example � This example refers to the ‘paranoia’ data. � We will use 1-Way ANOVA to look at the influence a factor on the response variable ‘vrtotal’ • Gender � vrtotal is the ‘total’ paranoia experienced by subjects in the VR. 8
MATLAB � Make the variables by extracting from the relevant columns of the spreadsheet (see answers to exercises 3) � vrtotal = s.data(:,14); � sex = s.data(:,3); 9
Influence of Gender � H0: no difference between mean paranoia of males and females � [P,anovatab] = anova1(vrtotal,sex,'on') • vrtotal is the response • ‘sex’ is the gender factor • ‘on’ means that we want a graphical display • P is the resulting significance level of the fit under H0 • anovatab is the corresponding Analysis of Variance table that is output 10
Influence of Gender Source SS Df MS F Prob>F Groups 0.0417 1 0.0417 0.0024 0.9615 Error 384.9 22 17.50 Total 384.96 23 In this case we clearly do not reject the null hypothesis. We conclude that gender has no influence on the response. 11
Using GLIM � $factor sex 2 !a factor with 2 levels � $fit $ !the deviance gives the total SS and d.f. � $fit sex !the deviance gives the residual SS and d.f. � The rest can be computed from these two • Fitted SS = total SS – residual SS • etc 12
GLIM Note on Factor Levels � Factor levels must start from 1 not 0 � Hence in GLIM for this data we would have to do: � $cal sex = sex+1 (since sex is coded as 0,1) 13
Two Way ANOVA � Of course just examining one factor is restrictive � There may be several factors � We will consider 2 factors influencing the response 14
2 Way ANOVA Model � The way to express this situation is • E(y ij ) = µ + α i + β j – i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2) • µ is the ‘grand mean’ • α I is the effect of being at the ith level of factor 1 • β j is the effect of being at the jth level of factor 2 • y ij ~ independent normal distribution r.v.s with constant variance σ 2 . 15
2-Way ANOVA and GLM � Again, this is a special case of GLM � LS estimators are: µ = * y α = − * y y Mean of ith row – grand mean i i β = − y y Mean of jth column – grand mean j j � Similar ANOVA table can be constructed 16
Example – Fear of Public Speaking in Virtual Reality � Two Factors • Immersion – 2 levels Desktop or Head Mounted Display (HMD) • Group – 3 levels – had a neutral, positive or negative virtual audience. � Objective – to see how anxiety varies with Group, Immersion, and prior tendency to fear of public speaking (FOPS) � Response variables – various measures of anxiety and comfort. 17
2-Way ANOVA Example � Response variable ‘interested’ • This is the person’s personal self assessment of the ‘interest’ of the virtual audience – “How interested was the audience in what you had to say?” scored on a 1-7 scale with 1= not at all, 7=very much • We take the average of the interest scores for each person in each cell of the factorial table…. 18
Factorial Table for ‘Interested’ Average of Group: interested Neutral Positive Negative Grand Immersion 1 2 3 Total 1 Desktop 2.7 1.0 6.0 3.2 2 HMD 4.2 1.2 6.7 4.0 Grand Total 3.4 1.1 6.3 3.6 Each entry is the average ‘interested’ score for the 6 people in that group 19
ANOVA Using MATLAB � y =[2.7 1.0 6.0; 4.2 1.2 6.7]; � anova2(y,1,'on') • y is the response • 1 means the number of observations (replications) in each cell • ‘on’ means a graphical display is output 20
2-Way ANOVA Table Source SS Df MS F Prob>F Columns 27.6633 2 13.8317 64.3333 0.0153 Rows 0.9600 1 0.9600 4.4651 0.1689 Error 0.4300 2 0.2150 Total 29.0533 5 H0a: All row means equal ----- not rejected at 5% level H0b: All column means equal ---- rejected at 5% level 21
Using GLIM $units 6 $data x $read 2.7 1.0 6.0 4.2 1.2 6.7 $cal row = %gl(2,3) $ $c this generates 2 levels with 3 copies at each level $cal col = %gl(3,1) $c this calculates the column factor levels $factor row 2 col 3 $fit $c the deviance and df are for the total SS $fit row+col $c the deviance and df are for the residual SS $fit –row $c the change in deviance and df are the row SS $fit row + col $c the full model again $fit –col $c the change in deviance and df are the col SS $c from these the complete table can be constructed. 22
Example FOPS Continued � It is throwing data away to average the response variable within each cell. � Instead we can deal with the actual replications � This enhances the theoretical model since then we can include an ‘interaction’ term between row and column effects. � This interaction term models the non-additivity, i.e., allows for the possibility that the row and column factor together produce a sum that is more than the parts. 23
2-Way ANOVA with p replications per cell � The way to express this situation is • E(y ijk ) = µ + α i + β j + γ ij – i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2) – k = 1,2,…,p (number of replications in each cell) • µ is the ‘grand mean’ • α i is the effect of being at the ith level of factor 1 • β j is the effect of being at the jth level of factor 2 • γ ij is the interaction effect in the (i,j)th cell • y ijk ~ independent normal distribution r.v.s with constant variance σ 2 . 24
Using MATLAB � The data must be put in the form of the m*n table, but each cell must consist of a row of replications. � There will therefore be (mp) rows and n columns. � [p,table] = anova2(y,6,'on'); � will produce the table … 25
ANOVA Table Source SS Df MS F Prob>F Columns 166.0556 2 83.0278 72.5485 0.0000 Rows 5.4444 1 5.4444 4.7573 0.0371 Interaction 2.7222 2 1.3611 1.1893 0.3184 Error 34.3333 30 1.1444 Total 29.0533 35 The hypothesis that all column means (groups) are equal Would be rejected. The hypothesis that all row means (immersion) are equal Would be rejected. There is no interaction effect however. 26
Using GLIM � We can read the data file for GLIM directly without having to organise the data into the rows and columns � We read in the variables immersion, group and interested � $factor immersion 2 group 3$ � $c declares the factors 27
GLIM $echo $input 10 132 $echo File name? fops3006.txt $units 36 $data ID Immersion Group w Age sex Ethnic Language Occupation Games PRCS FNE SAD Comfortable pleased Audience People Computer aware impression friendly interested again selfrating somatic MPRCS $read Data goes here 28
Recommend
More recommend