summary dsm ts all each s has its own options summary dsm
play

summary(dsm_ts_all) Each s() has its own options summary(dsm_all) - PowerPoint PPT Presentation

summary(dsm_ts_all) Each s() has its own options summary(dsm_all) Count model count~... Using reference bands Term selection Model with no shrinkage ... with shrinkage The story so far... p-values Goodness of t Implications of Tobler's


  1. summary(dsm_ts_all) Each s() has its own options summary(dsm_all) Count model count~... Using reference bands Term selection Model with no shrinkage ... with shrinkage The story so far... p-values Goodness of �t Implications of Tobler's law Adding smooths EDF comparison Akaike's "An Information Criterion" Comparing models Path dependence is an issue here Sperm whale covariates Estimated abundance When to use each approach? Sperm whale response example (either) Term selection during �tting Life isn't that simple Recap Sperm whale response example Shrinkage approach Shrinkage example Model formulation Removing terms? abundance.est~... ( abundance.est ) 1. EDF tp ts dsm_ts_all <- dsm(count~s(x, y, bs="ts") + What is down to random variation? Test for zero effect of a smooth Usually have >1 option As for many other models, we can get an AIC from our Adding smooths Which enivronmental covariates? How GAMs work Practical choice Pure spatial, pure environmental, mixed? Q-Q plots Already know that + is our friend ( *** ), remove terms 1-by-1 (silly) Strategy: want all ## ## Effort is effective effort s(..., k=...) to adjust basis size Two popular approaches s(Depth, bs="ts") + Already selecting Basis s(..., bs="ts") - Detection covariate: 𝑞 ≈ 0 ## Family: Tweedie(p=1.25) ## Family: Tweedie(p=1.277) model Terms with EDF<1 may not be useful (can we remove?) s(x,y) 5.2245 1.8875 s(DistToCAS, bs="ts") + (using -values) Prior knowledge of biology/ecology of species Which response distribution? How to include detection info Where does the model actually fail? They are approximate for GAMs (but useful) Closer to the line is better 2 detection function covariate "levels" Path dependence How can we pick? ## Link function: log ## Link function: log Two different universes appear: Can build a big model... Beaufort wigglyness of terms thin plate splines with Response is count per segment s(SST, bs="ts") + Effort is area of each segment s(..., bs="...") for basis type 2. non-significant -value 𝑞 ## ## Tobler's �rst law of geography Tobler's �rst law of geography Comparison of AIC fine but : s(Depth) 3.5679 3.6794 s(EKE, bs="ts") + shrinkage "Observer"/"observation" -- change within segment Detection covariate: ## Formula: ## Formula: 𝑞 What are drivers of distribution? Simple spatial-only models Removing smooths Even if we have 1 model, is it any good? But what does "close" mean? Which response? Resampling the response, generate bands Reported in summary Stepwise selection - path (via a penalty) Changes at segment level Decide on a significance level and use that as a rule s(NPP, bs="ts"), ## count ~ s(x, y, bs = "ts") + s(Depth, bs = "ts") + s(DistToCAS, ## count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + Response is estimated abundance per segment lots more options (we'll see a few here) can't compare Tweedie (continuous) and negative group size ( size ) Lecture 3: Multivariate smoothing Lecture 3: Multivariate smoothing s(DistToCAS) 1.0001 0.0001 dsm_all <- dsm(count~s(x, y) + ddf.obj=df_hr, dependence ## bs = "ts") + s(SST, bs = "ts") + s(EKE, bs = "ts") + s(NPP, ## s(NPP) + offset(off.set) remove the wiggles then "Segment" -- change between segments -values What data is available? binomial (discrete) distributions! s(Depth) + What about using it to segment.data=segs, observation.data=obs, count or abundance.est ## ## bs = "ts") + offset(off.set) 𝑞 qq.gam(dsm_all, asp=1, main="Tweedie", (This can be subtle, more in model checking tomorrow!) "Everything is related to everything else, but near things "Everything is related to everything else, but near things How to select between possible models? s(SST) 5.9267 0.3827 remove the "linear" bits s(DistToCAS) + Changes at observation (In some sense leaving "shrunk" terms in is more "consistent" Now we have a huge model, what do we do? Now we have a huge model, what do we do? Selecting between response distributions Selecting between response distributions Which response type? Which response type? Comparing models Comparing models Adding covariates Adding covariates Recap Recap & family=tw()) ## ## Parametric coefficients: remove the whole term? cex=5, rep=100) shrinkage All possible subsets - s(SST) + "Count model" only lets us use segment-level covariates ( within distribution is fine) level ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) in terms of variance estimation, but can be computationally s(EKE) 1.7631 0.8196 are more related than distant things" are more related than distant things" s(EKE) + nullspace should be computationally expensive ## (Intercept) -20.6368 0.2751 -75 <2e-16 *** ## Estimate Std. Error t value Pr(>|t|) model selection model selection annoying) s(NPP), "Estimated abundance" lets us use either Comparing models shrunk less than the s(NPP) 2.3931 0.0004 (fishing?) ## --- ## (Intercept) -20.260 0.234 -86.59 <2e-16 *** abundance.est only ddf.obj=df_hr, ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 wiggly part AIC(dsm_all) segment.data=segs, observation.data=obs, Tobler (1970) Tobler (1970) ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Comparing response distributions family=tw()) ## Approximate significance of smooth terms: ## ## edf Ref.df F p-value ## Approximate significance of smooth terms: ## [1] 1238.288 This isn't very satisfactory! ## s(x,y) 5.225 7.153 1.233 0.2920 ## edf Ref.df F p-value ## s(x,y) 1.8875209 29 0.705 4.33e-06 *** ## s(Depth) 3.568 4.439 6.641 1.82e-05 *** AIC(dsm_ts_all) ## s(DistToCAS) 1.000 1.000 1.504 0.2204 ## s(Depth) 3.6794182 9 4.811 < 2e-16 *** ## s(SST) 5.927 6.986 2.068 0.0407 * ## s(DistToCAS) 0.0000934 9 0.000 0.6797 ## s(SST) 0.3826654 9 0.063 0.2160 ## s(EKE) 1.763 2.225 2.579 0.0693 . ## [1] 1225.822 ## s(EKE) 0.8196256 9 0.499 0.0178 * ## s(NPP) 2.393 3.068 0.856 0.4678 ## s(NPP) 0.0003570 9 0.000 0.8372 ## --- 13 / 37 11 / 37 34 / 37 37 / 37 14 / 37 15 / 37 11 / 37 35 / 37 17 / 37 20 / 37 16 / 37 19 / 37 36 / 37 36 / 37 18 / 37 12 / 37 28 / 37 24 / 37 22 / 37 31 / 37 24 / 37 30 / 37 30 / 37 29 / 37 23 / 37 32 / 37 26 / 37 25 / 37 27 / 37 27 / 37 10 / 37 21 / 37 33 / 37 7 / 37 1 / 37 7 / 37 2 / 37 8 / 37 6 / 37 5 / 37 4 / 37 9 / 37 3 / 37 4 / 37 1 / 37 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 f

  2. The story so far... How GAMs work How to include detection info Simple spatial-only models 2 / 37

  3. Life isn't that simple Which enivronmental covariates? Which response distribution? Which response? How to select between possible models? 3 / 37

  4. Adding covariates Adding covariates 4 / 37 4 / 37

  5. Model formulation Pure spatial, pure environmental, mixed? Prior knowledge of biology/ecology of species What are drivers of distribution? What data is available? 5 / 37

  6. Sperm whale covariates 6 / 37

  7. Tobler's �rst law of geography Tobler's �rst law of geography "Everything is related to everything else, but near things "Everything is related to everything else, but near things are more related than distant things" are more related than distant things" Tobler (1970) Tobler (1970) 7 / 37 7 / 37

  8. Implications of Tobler's law 8 / 37

  9. Adding smooths Already know that + is our friend Can build a big model... dsm_all <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw()) 9 / 37

  10. Each s() has its own options s(..., k=...) to adjust basis size s(..., bs="...") for basis type lots more options (we'll see a few here) 10 / 37

  11. Now we have a huge model, what do we do? Now we have a huge model, what do we do? 11 / 37 11 / 37

  12. Term selection Two popular approaches (using -values) 𝑞 Stepwise selection - path dependence All possible subsets - computationally expensive (fishing?) 12 / 37

  13. p-values Test for zero effect of a smooth They are approximate for GAMs (but useful) Reported in summary 13 / 37

  14. summary(dsm_all) ## ## Family: Tweedie(p=1.25) ## Link function: log ## ## Formula: ## count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + ## s(NPP) + offset(off.set) ## ## Parametric coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -20.6368 0.2751 -75 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Approximate significance of smooth terms: ## edf Ref.df F p-value ## s(x,y) 5.225 7.153 1.233 0.2920 ## s(Depth) 3.568 4.439 6.641 1.82e-05 *** ## s(DistToCAS) 1.000 1.000 1.504 0.2204 ## s(SST) 5.927 6.986 2.068 0.0407 * ## s(EKE) 1.763 2.225 2.579 0.0693 . ## s(NPP) 2.393 3.068 0.856 0.4678 ## --- 14 / 37 ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Recommend


More recommend