R Module Day 2: Sta0s0cs Drew Allen Topics Covered - PowerPoint PPT Presentation

R ¡Module ¡Day ¡2: ¡ Sta0s0cs ¡ Drew ¡Allen ¡

Topics ¡Covered ¡ • Sta0s0cal ¡Distribu0ons ¡ • Summary ¡Sta0s0cs ¡ • T ¡tests ¡ • Regression ¡(simple ¡linear, ¡mul0ple ¡linear) ¡ • Analysis ¡of ¡Variance ¡

Sta0s0cal ¡Distribu0ons ¡

Some ¡Basic ¡Defini0ons ¡ • Random ¡Variable ¡– ¡a ¡variable ¡whose ¡value ¡is ¡not ¡ known ¡with ¡certainty ¡ • Random ¡Variate ¡– ¡par0cular ¡outcome ¡of ¡a ¡ random ¡variable ¡ • Probability ¡-‑-‑ ¡denotes ¡the ¡ rela%ve ¡frequency ¡of ¡ occurrence ¡of ¡a ¡par0cular ¡value ¡ • Probability ¡distribu3on ¡yields ¡the ¡probability ¡of ¡ – Each ¡value ¡of ¡a ¡random ¡variable ¡( discrete ¡ distribu3on ) ¡ – the ¡value ¡of ¡a ¡random ¡falling ¡within ¡a ¡par0cular ¡ interval ¡( con3nuous ¡distribu3on ) ¡

Probability density (i.e. height) at a ¡ dnorm(a,mean=0,sd=1) P(t>a) ¡ a ¡

Probabilities from - ∞ to a ¡ pnorm(a,mean=0,sd=1,lower.tail=TRUE) P(t>a) ¡ P(t<a) ¡ a ¡

Probabilities from a to ∞ ¡ pnorm(a,mean=0,sd=1,lower.tail=FALSE) P(t>a) ¡ P(t>a) ¡ a ¡

Probabilities from - ∞ to a ¡ qnorm(0.4,mean=-2,sd=sqrt(0.5))

Samples from a distribution ¡ rnorm(1000,mean=12,sd=6) Histogram of rnorm(10000, mean = 12, sd = 6) 3000 2500 2000 Frequency 1500 1000 500 0 -10 0 10 20 30 rnorm(10000, mean = 12, sd = 6)

Func0ons ¡have ¡required ¡and ¡op0onal ¡ arguments ¡ • Works ¡(no ¡required ¡arguments) ¡ – q() • Doesn’t ¡work: ¡ – rnorm() • Does ¡work ¡(cau0on: ¡computer ¡assigns ¡values ¡for ¡ you ¡some ¡arguments!) ¡ – rnorm(100) • Does ¡work ¡(all ¡arguments ¡specified ¡by ¡user) ¡ – rnorm(100,mean=1,sd=4) – rnorm(mean=1,sd=4,n=100)

Exercise ¡1: ¡ Using ¡R ¡as ¡a ¡Sta0s0cs ¡Table ¡ • Generate ¡a ¡sample ¡of ¡1000 ¡variates ¡from ¡a ¡ normal ¡distribu0on ¡of ¡mean ¡10 ¡and ¡standard ¡ devia0on ¡5 ¡using ¡ rnorm • For ¡this ¡sample, ¡calculate ¡what ¡frac0on ¡of ¡the ¡ points ¡take ¡values ¡<5 ¡(hint: ¡use ¡ length ) ¡ • Using ¡ pnorm , ¡calculate ¡the ¡theore0cally ¡ predicted ¡frac0on ¡of ¡points ¡that ¡should ¡take ¡ values ¡< ¡5 ¡

Built-‑in ¡Probability ¡Distribu0ons: ¡ for ¡a ¡full ¡list, ¡type ¡ ?Distributions ¡ Con3nuous ¡distribu3ons ¡ Discrete ¡distribu3ons ¡ Normal ¡ ¡ • • Binomial ¡ t ¡ • • Poisson ¡ Chi-‑squared ¡ • F ¡ • • Geometric ¡ Exponen0al ¡ • • Hypergeometric ¡ Uniform ¡ • • Nega0ve ¡binomial ¡ Beta ¡ • Cauchy ¡ • Logis0c ¡ • Lognormal ¡ • Gamma ¡ • Weibull ¡ ¡ •

Other ¡Distribu0ons ¡Use ¡Similar ¡Syntax ¡ NORMAL DISTRIBUTION UNIFORM DISTRIBUTION • • dnorm(x, mean = 0, sd = dunif(x, min=0, max=1, 1, log = FALSE) log = FALSE) • • pnorm(q, mean = 0, sd = punif(q, min=0, max=1, 1, lower.tail = TRUE, lower.tail = TRUE, log.p = FALSE) log.p = FALSE) • • qnorm(p, mean = 0, sd = qunif(p, min=0, max=1, 1, lower.tail = TRUE, lower.tail = TRUE, log.p = FALSE) log.p = FALSE) • • rnorm(n, mean = 0, sd = runif(n, min=0, max=1) 1)

Exercise ¡2: ¡ Using ¡R ¡as ¡a ¡Sta0s0cs ¡Table ¡ • What ¡is ¡the ¡probability ¡that ¡a ¡variate ¡picked ¡at ¡ random ¡from ¡gamma ¡distribu0on ¡with ¡a ¡ shape ¡of ¡3 ¡and ¡scale ¡of ¡1 ¡is ¡< ¡0.68? ¡[use ¡ pgamma ] ¡ ¡ • What ¡is ¡the ¡probability ¡that ¡a ¡variate ¡selected ¡ at ¡random ¡from ¡an ¡exponen0al ¡distribu0on ¡ with ¡rate ¡of ¡1 ¡lies ¡between ¡0.1 ¡and ¡10? ¡[use ¡ pexp ] ¡

Sta0s0cal ¡distribu0ons ¡provide ¡a ¡ means ¡to ¡perform ¡simula0ons ¡ • #using ¡r ¡for ¡simula0on ¡of ¡1D ¡random ¡walker ¡ • steps<-‑rnorm(n=10000,mean=0,sd=1) ¡ • distance.from.origin ¡<-‑ ¡cumsum(steps) ¡ • plot(distance.from.origin,type='l') ¡

Summary ¡Sta0s0cs ¡

Some ¡Func0ons ¡for ¡Calcula0ng ¡ Summary ¡Sta0s0cs ¡ Minimum: ¡ min() • Maximum: ¡ max() • Range ¡(Minimum ¡and ¡Maximum): ¡ range() • Mean: ¡ mean() • Median: ¡ median() • Quan0les: ¡ quantile() • Interquar0le ¡range: IQR() • Variance: ¡ var() • Standard ¡Devia0on: ¡ sd() • Summary: ¡ summary() • Stem ¡& ¡Leaf ¡Plot: ¡ stem() • Boxplot: ¡ boxplot() • QQ ¡Plot: ¡qqnorm(), ¡qqline() ¡ •

Func0ons ¡for ¡Calcula0ng ¡Summary ¡ Sta0s0cs ¡ 75% ¡quan0le ¡ 1.5xIQR ¡ Median ¡ >x<-rnorm(100) 25% ¡quan0le ¡ >boxplot(x) IQR ¡ 1.5xIQR ¡ IQR= ¡75% ¡quan0le ¡-‑25% ¡quan0le= ¡Inter ¡Quan0le ¡Range ¡ Everything ¡above ¡or ¡ below ¡are ¡considered ¡ outliers ¡

QQ ¡Plot ¡ • Many ¡sta0s0cal ¡methods ¡make ¡some ¡assump0on ¡ about ¡the ¡distribu0on ¡of ¡the ¡data ¡(e.g. ¡Normal) ¡ • The ¡quan0le-‑quan0le ¡plot ¡provides ¡a ¡way ¡to ¡ visually ¡verify ¡such ¡assump0ons ¡ • The ¡QQ-‑plot ¡shows ¡the ¡theore0cal ¡quan0les ¡ versus ¡the ¡empirical ¡quan0les. ¡If ¡the ¡distribu0on ¡ assumed ¡(theore0cal ¡one) ¡is ¡indeed ¡the ¡correct ¡ one, ¡we ¡should ¡observe ¡a ¡straight ¡line. ¡

QQ ¡Plot ¡ • x<-rnorm(100) • qqnorm(x) • qqline(x)

Func0ons ¡for ¡Calcula0ng ¡Summary ¡ Sta0s0cs ¡ • Two ¡func0ons ¡are ¡extremely ¡useful ¡for ¡ calcula0ng ¡summary ¡sta0s0cs ¡for ¡subsets ¡of ¡data: ¡ – apply() ¡(calculates ¡func0on ¡on ¡a ¡column-‑by ¡– column ¡or ¡row-‑by-‑row ¡basis) – tapply() (groups ¡data ¡in ¡one ¡column ¡based ¡on ¡ values ¡in ¡another ¡column) • Example ¡Script: ¡ – summary_statistics.R

T ¡test ¡

What ¡does ¡ Student’s ¡t ¡ distribu0on ¡ have ¡to ¡do ¡with ¡ Guinness ¡Stout? ¡

T ¡distribu0on ¡ • The ¡t ¡distribu0on ¡was ¡introduced ¡by ¡William ¡ Gosset, ¡a ¡chemist ¡working ¡for ¡Guinness ¡ brewery ¡in ¡Ireland ¡ • He ¡published ¡his ¡work ¡under ¡the ¡pen ¡name ¡ “Student” ¡ ¡because ¡Guinness ¡regarded ¡the ¡ fact ¡that ¡they ¡were ¡using ¡sta0s0cs ¡to ¡help ¡ with ¡brewing ¡to ¡be ¡a ¡trade ¡secret ¡

T ¡test ¡Example: ¡ Darwin’s ¡Plant ¡Growth ¡Data ¡ • Data ¡are ¡from ¡Darwin's ¡study ¡of ¡cross-‑ ¡and ¡self-‑ fer0liza0on. ¡ ¡ • Pairs ¡of ¡seedlings ¡of ¡the ¡same ¡age, ¡one ¡produced ¡by ¡ cross-‑fer0liza0on ¡and ¡the ¡other ¡by ¡self-‑fer0liza0on, ¡ were ¡grown ¡together ¡so ¡that ¡the ¡members ¡of ¡each ¡pair ¡ were ¡reared ¡under ¡nearly ¡iden0cal ¡condi0ons. ¡ ¡ • The ¡data ¡are ¡the ¡final ¡heights ¡of ¡each ¡plant ¡aoer ¡a ¡ fixed ¡period ¡of ¡0me, ¡in ¡inches. ¡ ¡ • Darwin ¡consulted ¡the ¡famous ¡19th ¡century ¡sta0s0cian ¡ Francis ¡Galton ¡about ¡the ¡analysis ¡of ¡these ¡data ¡

• Please ¡download ¡the ¡following ¡files: ¡ – binary.csv – gala.txt – darwin.txt ¡

Exercise ¡3: ¡ Darwin’s ¡Plant ¡Growth ¡Data ¡ • Import ¡ darwin.txt • Conduct ¡ ¡a ¡paired ¡T ¡test ¡using ¡the ¡func0on ¡ t.test() – Type ¡ ?t.test for ¡some ¡help ¡ • Answer ¡the ¡following ¡ques0ons: ¡ – What ¡is ¡the ¡mean ¡difference, ¡ m , ¡between ¡the ¡treatments? ¡ – What ¡is ¡the ¡standard ¡devia0on, ¡ s , ¡of ¡the ¡paired ¡ differences? ¡ ¡ – According ¡to ¡the ¡t ¡test, ¡is ¡the ¡difference ¡significant ¡at ¡the ¡P ¡ = ¡0.05 ¡level ¡for ¡the ¡two-‑tailed ¡test? ¡ – According ¡to ¡the ¡non-‑parametric ¡analogue ¡of ¡the ¡t ¡test ¡(Mann-‑ Whitney ¡U), ¡is ¡the ¡difference ¡significant ¡at ¡the ¡P ¡= ¡0.05 ¡level ¡for ¡ the ¡two-‑tailed ¡test? ¡ [Use ¡ wilcox.test ] ¡

R Module Day 2: Sta0s0cs Drew Allen Topics Covered - PowerPoint PPT Presentation

R Module Day 2: Sta0s0cs Drew Allen Topics Covered Sta0s0cal Distribu0ons Summary Sta0s0cs T tests Regression (simple linear, mul0ple linear)

Historical sta0s0cs from the perspec0ve of contemporary economics

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Green City Bonds North America July 28 2015 Dial-in +1 631 267 4890 Meeting number 846 764

Lessons from Ohio Interagency Forestry Team & Wayne National Forest Plan Assessment Carrie

Brecon Beacons National Park Authority F indings and Conclusions WFG examination and Audit of

BP Midstream Partners 4Q and full year 2019 Results February 27, 2020 1 BP MIDSTREAM PARTNERS

15. C++ advanced (III): Functors and Lambda 409 What do we learn today? Functors: objects with

CSE 473: Ar+ficial Intelligence Par+cle Filters for HMMs

Parameter Passing Styles Dr. Mattox Beckman University of Illinois at Urbana-Champaign

CSE 154 LECTURE 8: FORMS Web data most interesting web pages revolve around data

R Module Day 2: Sta0s0cs Drew Allen Topics Covered - PowerPoint PPT Presentation

R Module Day 2: Sta0s0cs Drew Allen Topics Covered Sta0s0cal Distribu0ons Summary Sta0s0cs T tests Regression (simple linear, mul0ple linear)

Historical sta0s0cs from the perspec0ve of contemporary economics

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

Agenda Module 1 - Risk, Volatility &amp; Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Module 4 AFA CyberCamp Format Day T wo Day Three Day Four Day Five Day One Windows

2020 Effective Mentoring Program Combined Program (School and Early Childhood) Day 2 1 2020 SB

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting &amp; Presentation Skills Level : 4 Credit Value : 20 Code of module

Green City Bonds North America July 28 2015 Dial-in +1 631 267 4890 Meeting number 846 764

Lessons from Ohio Interagency Forestry Team &amp; Wayne National Forest Plan Assessment Carrie

Brecon Beacons National Park Authority F indings and Conclusions WFG examination and Audit of

BP Midstream Partners 4Q and full year 2019 Results February 27, 2020 1 BP MIDSTREAM PARTNERS

15. C++ advanced (III): Functors and Lambda 409 What do we learn today? Functors: objects with

CSE 473: Ar+ficial Intelligence Par+cle Filters for HMMs

Parameter Passing Styles Dr. Mattox Beckman University of Illinois at Urbana-Champaign

CSE 154 LECTURE 8: FORMS Web data most interesting web pages revolve around data

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Lessons from Ohio Interagency Forestry Team & Wayne National Forest Plan Assessment Carrie