r regression methods
play

R Regression Methods Interrogate R Output Objects Paul E. Johnson - PowerPoint PPT Presentation

Regression Methods 1 / 72 R Regression Methods Interrogate R Output Objects Paul E. Johnson Center for Research Methods and Data Analysis University of Kansas 2012 Regression Methods 2 / 72 Outline 1 Methods 2 Interrogate Models


  1. Regression Methods 1 / 72 R Regression Methods Interrogate R Output Objects Paul E. Johnson Center for Research Methods and Data Analysis University of Kansas 2012

  2. Regression Methods 2 / 72 Outline 1 Methods 2 Interrogate Models

  3. Regression Methods 3 / 72 Methods Methods: Things To Do“To”a Regression Object bush1 < − glm ( pres04 ∼ p a r t y i d + sex + owngun , data=dat , f a m i l y= b i n o m i a l ( l i n k=l o g i t ) ) pres04 Kerry, Bush partyid Factor with 7 levels, SD → SR sex Male, Female owngun Yes, No

  4. Regression Methods 4 / 72 Methods Just for the Record, The Data Preparation Steps Were . . . p r e s l e v < − l e v e l s ( dat $ pres04 ) dat $ pres04 [ dat $ pres04 %i n% p r e s l e v [ 3 : 1 0 ] ] < − NA dat $ pres04 < − f a c t o r ( dat $ pres04 ) l e v e l s ( dat $ pres04 ) < − c ( ”Kerry ” , ”Bush ”) p l e v < − l e v e l s ( dat $ p a r t y i d ) dat $ p a r t y i d [ dat $ p a r t y i d %i n% p l e v [ 8 ] ] < − NA dat $ p a r t y i d < − f a c t o r ( dat $ p a r t y i d ) l e v e l s ( dat $ p a r t y i d ) < − c ( ”Strong Dem. ” , ”Dem. ” , ”I n d . Near Dem. ” , ” Independent ” , ”I n d . Near Repub. ” , ”Repub. ” , ”Strong Repub. ”) dat $owngun [ dat $owngun == ”REFUSED”] < − NA l e v e l s ( dat $ sex ) < − c ( ”Male ” , ”Female ”) dat $owngun < − r e l e v e l ( dat $owngun , r e f=”NO”)

  5. Regression Methods 5 / 72 Methods First, Find Out What You Got I a t t r i b u t e s ( bush1 ) $names [ 1 ] ” c o e f f i c i e n t s ” ” r e s i d u a l s ” [ 3 ] ” f i t t e d . v a l u e s ” ” e f f e c t s ” [ 5 ] ”R” ”rank ” [ 7 ] ”qr ” ”f a m i l y ” [ 9 ] ” l i n e a r . p r e d i c t o r s ” ”deviance ” [ 1 1 ] ”a i c ” ”n u l l . d e v i a n c e ” [ 1 3 ] ” i t e r ” ”weights ” [ 1 5 ] ”p r i o r . w e i g h t s ” ” d f . r e s i d u a l ” [ 1 7 ] ” d f . n u l l ” ”y ” [ 1 9 ] ”converged ” ”boundary ” [ 2 1 ] ”model ” ”n a . a c t i o n ” [ 2 3 ] ” c a l l ” ”formula ” [ 2 5 ] ”terms ” ”data ” [ 2 7 ] ” o f f s e t ” ”c o n t r o l ” [ 2 9 ] ”method ” ”c o n t r a s t s ” [ 3 1 ] ” x l e v e l s ” $ c l a s s [ 1 ] ”glm ” ”lm ”

  6. Regression Methods 6 / 72 Methods Understanding attributes If you see $, it means you have an S3 object That means you can just“take”values out of the object with the dollar sign operator using commands like bush1 $ c o e f f i c i e n t s ( I n t e r c e p t ) partyidDem. − 3.571 1 .910 p a r t y i d I n d . Near Dem. p a r t y i d I n d e p e n d e n t 1 .456 3 .464 p a r t y i d I n d . Near Repub. partyidRepub. 5 .468 6 .031 p a r t y i d S t r o n g Repub. sexFemale 7 .191 0 .049 owngunYES 0 .642

  7. Regression Methods 7 / 72 Methods R Core Team Warns against $ Access A usage like this works bush1 $ c o e f f i c i e n t s But it might not work in the future, if the internal contents of the glm object were to change We should instead use the ” extractor method” c o e f f i c i e n t s ( bush1 ) Challenge: finding/remembering the extractor functions. Especially difficult because some VERY important extractor functions don’t show up using usual methods of searching for them (AIC, coefficients)

  8. Regression Methods 8 / 72 Methods Double-Check the glm Object’s Class Ask the object what class it is from c l a s s ( bush1 ) [ 1 ] ”glm ” ”lm ”

  9. Regression Methods 9 / 72 Methods Ask R What Methods are declared to apply to a“glm” Object I methods ( c l a s s = ”glm ”) [ 1 ] add1.glm ✯ anova.glm [ 3 ] c o n f i n t . g l m ✯ c o o k s . d i s t a n c e . g l m ✯ [ 5 ] d e v i a n c e . g l m ✯ drop1.glm ✯ [ 7 ] e f f e c t s . g l m ✯ e xtrac tAI C. gl m ✯ [ 9 ] f a m i l y . g l m ✯ formula.glm ✯ [ 1 1 ] i n f l u e n c e . g l m ✯ l o g L i k . g l m ✯ [ 1 3 ] model.frame.glm nobs.glm ✯ [ 1 5 ] p r e d i c t . g l m p r i n t . g l m [ 1 7 ] r e s i d u a l s . g l m r s t a n d a r d . g l m [ 1 9 ] r s t u d e n t . g l m summary.glm [ 2 1 ] vcov.glm ✯ weights.glm ✯ Non − visible f u n c t i o n s are a s t e r i s k e d

  10. Regression Methods 10 / 72 Methods Check methods for“lm”class I methods ( c l a s s = ”lm ”) [ 1 ] add1.lm ✯ a l i a s . l m ✯ [ 3 ] anova.lm case.names.lm ✯ [ 5 ] c o n f i n t . l m ✯ c o o k s . d i s t a n c e . l m ✯ [ 7 ] d e v i a n c e . l m ✯ d f b e t a . l m ✯ [ 9 ] d f b e t a s . l m ✯ drop1.lm ✯ [ 1 1 ] dummy.coef.lm ✯ e f f e c t s . l m ✯ [ 1 3 ] e x t r a c t A I C . l m ✯ f a m i l y . l m ✯ [ 1 5 ] formula.lm ✯ h a t v a l u e s . l m [ 1 7 ] i n f l u e n c e . l m ✯ kappa.lm [ 1 9 ] l a b e l s . l m ✯ l o g L i k . l m ✯ [ 2 1 ] model.frame.lm model.matrix.lm [ 2 3 ] nobs.lm ✯ p l o t . l m [ 2 5 ] p r e d i c t . l m p r i n t . l m [ 2 7 ] p r o j . l m ✯ qr.lm ✯ [ 2 9 ] r e s i d u a l s . l m r s t a n d a r d . l m [ 3 1 ] r s t u d e n t . l m s i m u l a t e . l m ✯ [ 3 3 ] summary.lm v a r i a b l e . n a m e s . l m ✯ [ 3 5 ] vcov.lm ✯ Non − visible f u n c t i o n s are a s t e r i s k e d

  11. Regression Methods 11 / 72 Methods Looking Into the Class Hierarchy Functions are always located inside packages. With R, several packages are supplied and are automatically searched for methods. Read the source code for some of your favorite functions. lm p r e d i c t . l m glm p r e d i c t . g l m For functions in packages that are loaded, typing its name (without telling R what package it lives in) will show its contents.

  12. Regression Methods 12 / 72 Methods Functions, Methods and Hidden Methods Methods are ALSO FOUND if we ask for them explicitly with their namespace (and two colons).. s t a t s : : lm s t a t s : : p r e d i c t . l m s t a t s : : glm s t a t s : : p r e d i c t . g l m Result should be identical to previous code. Hidden methods: Functions that are not“exported”by the package writer remain hidden functions used by package author, but they don’t want create confusion by having users access them directly You can see code for hidden methods if you use three colons. s t a t s : : : c o n f i n t . l m s t a t s : : : weights.glm

  13. Regression Methods 13 / 72 Interrogate Models The First Method Used is usually summary() I summary ( bush1 ) C a l l : glm ( formula = pres04 ∼ p a r t y i d + sex + owngun , f a m i l y = b i n o m i a l ( l i n k = l o g i t ) , data = dat ) Deviance R e s i d u a l s : Min 1Q Median 3Q Max − 2.941 − 0.488 0 .163 0 .390 2 .683 C o e f f i c i e n t s : Estimate Std. E r r o r z v a l u e ( I n t e r c e p t ) − 3.5712 0 .3934 − 9.08 partyidDem. 1 .9103 0 .3972 4 .81 p a r t y i d I n d . Near Dem. 1 .4559 0 .4348 3 .35 p a r t y i d I n d e p e n d e n t 3 .4642 0 .4105 8 .44 p a r t y i d I n d . Near Repub. 5 .4677 0 .5073 10 .78 partyidRepub. 6 .0307 0 .4502 13 .39 p a r t y i d S t r o n g Repub. 7 .1908 0 .6213 11 .57 sexFemale 0 .0488 0 .1928 0 .25 owngunYES 0 .6424 0 .1937 3 .32 Pr ( > | z | ) ( I n t e r c e p t ) < 2e − 16 ✯✯✯

  14. Regression Methods 14 / 72 Interrogate Models The First Method Used is usually summary() II partyidDem. 1.5e − 06 ✯✯✯ p a r t y i d I n d . Near Dem. 0 .00081 ✯✯✯ p a r t y i d I n d e p e n d e n t < 2e − 16 ✯✯✯ p a r t y i d I n d . Near Repub. < 2e − 16 ✯✯✯ partyidRepub. < 2e − 16 ✯✯✯ p a r t y i d S t r o n g Repub. < 2e − 16 ✯✯✯ sexFemale 0 .80006 owngunYES 0 .00091 ✯✯✯ − − − S i g n i f . codes : 0 ✬✯✯✯ ✬ 0 .001 ✬✯✯ ✬ 0 .01 ✬✯ ✬ 0 .05 ✬ . ✬ 0 . 1 ✬ 1 ✬ ( D i s p e r s i o n parameter f o r b i n o m i a l f a m i l y taken to be 1) Null deviance : 1721 . 9 on 1242 degree s of freedom R e s i d u a l deviance : 764 .0 on 1234 degree s of freedom (3267 o b s e r v a t i o n s d e l e t e d due to m i s s i n g n e s s ) AIC : 782 Number of F i s h e r Scoring i t e r a t i o n s : 6

  15. Regression Methods 15 / 72 Interrogate Models Summary Object I Create a Summary Object sb1 < − summary ( bush1 ) a t t r i b u t e s ( sb1 ) $names [ 1 ] ” c a l l ” ”terms ” ”f a m i l y ” [ 4 ] ”deviance ” ”a i c ” ”c o n t r a s t s ” [ 7 ] ” d f . r e s i d u a l ” ”n u l l . d e v i a n c e ” ” d f . n u l l ” [ 1 0 ] ” i t e r ” ”n a . a c t i o n ” ”d e v i a n c e . r e s i d ” [ 1 3 ] ” c o e f f i c i e n t s ” ”a l i a s e d ” ”d i s p e r s i o n ” [ 1 6 ] ”df ” ”c o v . u n s c a l e d ” ”c o v . s c a l e d ” $ c l a s s [ 1 ] ”summary.glm ” My deviance is sb1 $ deviance [ 1 ] 764

  16. Regression Methods 16 / 72 Interrogate Models The coef Enigma I coef() is the same as coefficients() Note the Bizarre Truth: 1 that the“coef”function returns something different when it is applied to a model object c oe f ( bush1 ) ( I n t e r c e p t ) partyidDem. − 3.571 1 .910 p a r t y i d I n d . Near Dem. p a r t y i d I n d e p e n d e n t 1 .456 3 .464 p a r t y i d I n d . Near Repub. partyidRepub. 5 .468 6 .031 p a r t y i d S t r o n g Repub. sexFemale 7 .191 0 .049 owngunYES 0 .642 Than is returned from a summary object. c oe f ( sb1 )

Recommend


More recommend