Acti v it y of z ebrafish and melatonin C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u stin Bois Lect u rer , Caltech
CASE STUDIES IN STATISTICAL THINKING
Case st u dies in statistical thinking Hone and e x tend y o u r statistical thinking skills Work w ith real data sets Re v ie w of Statistical Thinking I and II CASE STUDIES IN STATISTICAL THINKING
Warming u p w ith z ebrafish 1 Mo v ie co u rtes y of Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
Nomenclat u re M u tant : Has the m u tation on both chromosomes Wild t y pe : Does not ha v e the m u tation CASE STUDIES IN STATISTICAL THINKING
Acti v it y of fish , da y and night 1 Data co u rtes y of A v ni Gandhi , Grigorios Oikonomo u, and Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
Acti v e bo u ts : a metric for w akef u lness Acti v e bo u t : A period of time w here a � sh is consistentl y acti v e Acti v e bo u t length : N u mber of consec u ti v e min u tes w ith acti v it y CASE STUDIES IN STATISTICAL THINKING
Probabilit y distrib u tions and stories Probabilit y distrib u tion : A mathematical description of o u tcomes A probabilit y distrib u tion has a stor y CASE STUDIES IN STATISTICAL THINKING
Distrib u tions from Statistical Thinking I Uniform Binomial Poisson Normal E x ponential CASE STUDIES IN STATISTICAL THINKING
The E x ponential distrib u tion Poisson process : The timing of the ne x t e v ent is completel y independent of w hen the pre v io u s e v ent happened Stor y of the E x ponential distrib u tion : The w aiting time bet w een arri v als of a Poisson process is E x ponentiall y distrib u ted CASE STUDIES IN STATISTICAL THINKING
The E x ponential CDF x, y = ecdf(nuclear_incident_times) _ = plt.plot(x, y, marker='.', linestyle='none') 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
The E x ponential CDF x, y = ecdf(nuclear_incident_times) _ = plt.plot(x, y, marker='.', linestyle='none') 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
import dc_stat_think as dcst dcst.pearson_r? Signature: dcst.pearson_r(data_1, data_2) Docstring: Compute the Pearson correlation coefficient between two samples. Parameters ---------- data_1 : array_like One-dimensional array of data. data_2 : array_like One-dimensional array of data. Returns ------- output : float The Pearson correlation coefficient between `data_1` and `data_2`. File: usr/local/lib/python3.5/site-packages/ dc_stat_think-0.1.4-py3.6.egg/dc_stat_think/dc_stat_think.py Type: function CASE STUDIES IN STATISTICAL THINKING
Using the dc _ stat _ think mod u le x, y = dcst.ecdf(nuclear_incident_times) % pip install dc_stat_think CASE STUDIES IN STATISTICAL THINKING
Let ' s practice ! C ASE STU D IE S IN STATISTIC AL TH IN K IN G
Bootstrap confidence inter v als C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u stin Bois Lect u rer , Caltech
EDA is the first step " E x plorator y data anal y sis can ne v er be the w hole stor y, b u t nothing else can ser v e as a fo u ndation stone , as the � rst step ." -- John T u ke y CASE STUDIES IN STATISTICAL THINKING
Acti v e bo u t length ECDFs 1 Data co u rtes y of A v ni Gandhi , Grigorios Oikonomo u, and Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
Optimal parameter v al u e Optimal parameter v al u e : The v al u e of the parameter of a probabilit y distrib u tion that best describes the data Optimal parameter for the E x ponential distrib u tion : Comp u ted from the mean of the data CASE STUDIES IN STATISTICAL THINKING
np.mean(nuclear_incident_times) 87.140350877192986 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
Bootstrap sample A resampled arra y of the data # Resample nuclear_incident_times with replacement bs_sample = np.random.choice( nuclear_incident_times, replace=True, size=len(inter_times) ) CASE STUDIES IN STATISTICAL THINKING
Bootstrap replicates 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
Bootstrap replicates 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
Bootstrap replicates 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
Bootstrap replicates 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
Bootstrap replicates Bootstrap replicate : A statistic comp u ted from a bootstrap sample CASE STUDIES IN STATISTICAL THINKING
dcst . dra w_ bs _ reps () F u nction to dra w bootstrap replicates from a data set # Draw 10000 replicates of the mean from # nuclear_incident_times bs_reps = dcst.draw_bs_reps( nuclear_incident_times, np.mean, size=10000 ) CASE STUDIES IN STATISTICAL THINKING
The bootstrap confidence inter v al 1 Data so u rce : Wheatle y, So v acool , and Sorne � e , N u clear E v ents Database CASE STUDIES IN STATISTICAL THINKING
The bootstrap confidence inter v al If w e repeated meas u rements o v er and o v er again , p % of the obser v ed v al u es w o u ld lie w ithin the p % con � dence inter v al CASE STUDIES IN STATISTICAL THINKING
The bootstrap confidence inter v al np.percentile(bs_reps, [2.5, 97.5]) array([ 73.31505848, 102.39181287]) CASE STUDIES IN STATISTICAL THINKING
Let ' s practice ! C ASE STU D IE S IN STATISTIC AL TH IN K IN G
H y pothesis tests C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u stin Bois Lect u rer , Caltech
Effects of m u tation on acti v it y 1 Data co u rtes y of A v ni Gandhi , Grigogios Oikonomo u, and Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
Genot y pe definitions Wild t y pe : No m u tations Hetero zy gote : M u tation on one of t w o chromosomes M u tant : M u tation on both chromosomes CASE STUDIES IN STATISTICAL THINKING
Effects of m u tation on acti v it y 1 Data co u rtes y of A v ni Gandhi , Grigogios Oikonomo u, and Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
Effects of m u tation on acti v it y 1 Data co u rtes y of A v ni Gandhi , Grigogios Oikonomo u, and Da v id Prober , Caltech CASE STUDIES IN STATISTICAL THINKING
H y pothesis test Assessment of ho w reasonable the obser v ed data are ass u ming a h y pothesis is tr u e CASE STUDIES IN STATISTICAL THINKING
p -v al u e The probabilit y of obtaining a v al u e of y o u r test statistic that is at least as e x treme as w hat w as obser v ed , u nder the ass u mption the n u ll h y pothesis is tr u e CASE STUDIES IN STATISTICAL THINKING
Test statistic A single n u mber that can be comp u ted from obser v ed data and from data y o u sim u late u nder the n u ll h y pothesis Ser v es as a basis of comparison CASE STUDIES IN STATISTICAL THINKING
p -v al u e The probabilit y of obtaining a v al u e of y o u r test statistic that is at least as e x treme as w hat w as obser v ed , u nder the ass u mption the n u ll h y pothesis is tr u e Req u ires clear speci � cation of : N u ll h y pothesis that can be sim u lated Test statistic that can be calc u lated from obser v ed and sim u lated data De � nition of at least as e x treme as CASE STUDIES IN STATISTICAL THINKING
Pipeline for h y pothesis testing Clearl y state the n u ll h y pothesis De � ne y o u r test statistic Generate man y sets of sim u lated data ass u ming the n u ll h y pothesis is tr u e Comp u te the test statistic for each sim u lated data set The p -v al u e is the fraction of y o u r sim u lated data sets for w hich the test statistic is at least as e x treme as for the real data CASE STUDIES IN STATISTICAL THINKING
Specif y ing the test N u ll h y pothesis : the acti v e bo u t lengths of w ild t y pe and hetero zy gotic � sh are identicall y distrib u ted Test statistic : Di � erence in mean acti v e bo u t length bet w een hetero zy gotes and w ild t y pe At least as e x treme as : Test statistic is greater than or eq u al to w hat w as obser v ed CASE STUDIES IN STATISTICAL THINKING
Perm u tation test For each replicate : Scramble labels of data points Comp u te test statistic perm_reps = dcst.draw_perm_reps( data_a, data_b, dcst.diff_of_means, size=10000 ) p -v al u e is the fraction of replicates at least as e x treme as w hat w as obser v ed p_val = np.sum(perm_reps >= diff_means_obs) / len(perm_reps) CASE STUDIES IN STATISTICAL THINKING
Let ' s practice ! C ASE STU D IE S IN STATISTIC AL TH IN K IN G
Linear regressions and pairs bootstrap C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u stin Bois Lect u rer , Caltech
Bacterial gro w th 1 Images co u rtes y of Jin Park and Michael Elo w it z, Caltech CASE STUDIES IN STATISTICAL THINKING
Bacterial gro w th CASE STUDIES IN STATISTICAL THINKING
_ = plt.semilogy(t, bac_area, marker='.', linestyle='none') _ = plt.xlabel('time (hr)') _ = plt.ylabel('area (sq. µm)') plt.show() CASE STUDIES IN STATISTICAL THINKING
Recommend
More recommend