modeling defoliation of pinus radiata trees using
play

Modeling defoliation of Pinus Radiata trees using hyperspectral - PowerPoint PPT Presentation

Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018 1


  1. Modeling defoliation of Pinus Radiata trees using hyperspectral remote sensing data Patrick Schratz 1 , Jannes Muenchow 1 , Eugenia Iturritxa 2 , Alexander Brenning 1 LIFE Healthy Forest Workhshop, Vitoria-Gasteiz (Spain), 19 Sep 2018  1 Department of Geography, GIScience group, University of Jena   2 NEIKER, Vitoria-Gasteiz, Spain   https://pjs-web.de  @pjs_228  @pat-s  @pjs_228  patrick.schratz@uni-jena.de  Patrick Schratz Slides: https://bit.ly/2Nls9Do

  2. Contribution to project goals Action B1 Deliverables Remotely-sensed forest health map (80%) (maps for all plots, Basque Country missing) Maps of forest disease potential (50%) (Diplodia & Fusarium, Armillaria & Heterobasidion missing) Milestones Data for spatial data analysis compiled  Developed model of forest disease potential  ( xgboost ) Final selection of algorithm for remotely-sensed forest health mapping  ( xgboost ) 2 / 28

  3. Introduction Study aims Modelling defoliation (as a proxy of tree health) using remote sensing data (high- dimensional modeling problem) Find the most important variables and predict defoliation to the whole Basque Country Find the best performing algorithm 3 / 28

  4. Data  4 / 28

  5. Data  Hyperspectral data Airborne data collected end of September 2016 Characteristic Value Geometric resolution 1 m Radiometric resolution 12 bit Spectral resolution 126 bands (404.08 nm - 996.31 nm) Geometric, radiometric and atmospheric corrected Survey data In-situ data from Laukiz 1 , Laukiz 2 , Luiando , Oiartzun (total 1750 observations) Surveyed in September 2016 Variables like defoliation (in three height levels), number of cankers , age , diameter , etc. 5 / 28

  6. Data  6 / 28

  7. Methods  7 / 28

  8. Methods  Variable retrival Extract as much information from the hyperspecctral data as possible Lehnert, Meyer, and Bendix (2018) 90 vegetation indices (using the hsdar R package (Lehnert, Meyer, and Bendix, 2018)) 7xxx NRIs (Normalized Ratio Indices) What are NRIs? b j i b i b j where and are the respective band numbers. The most famous NRI is the (Normalized Difference Vegetation Index). N I 122 valid bands (4 were corrupt): (125 * 126) / 2 = 7875 - corrupt bands and bands with division by zero = 7471 valid NRI variables . 8 / 28

  9. Methods  Algorithm benchmarking Extreme Gradient Boosting (xgboost) (Chen and Guestrin, 2016) Support Vector Machine (SVM) (Vapnik, 1998) Ridge regression (RR) (Friedman, Hastie, and Tibshirani, 2010) Hyperparameter tuning Using SMBO (Sequential-based Model Optimization) (Bischl, Richter, Bossek, et al., 2017) Different partitioning schemes for performance estimation Plot level Tree level 9 / 28

  10. Methods  Plot level Training on 3 out of 4 plots, testing on the remaining one. Four performance estimates. Tree level Spatial partitioning using k-means clustering (Brenning, 2012). Five folds Five repetitions 10 / 28

  11. Methods  Variable importance Aim: Find the most important variables among the 7xxx predictors Method: Using the internal variable importance measure of the winning algorithm ( xgboost ): Gain : The relative contribution of the feature to the model Cover metric : How often a feature was selected to be the deciding feature in a tree for a specific observation Frequency : How often a feature occurs in all trees of the model 11 / 28

  12. Results  12 / 28

  13. Results  Fig. 1: Descriptive statistics of the response variable defoliation . . 13 / 28

  14. Results  Performance (CV) Tab. 1: Spatial block CV performances of RR, SVM and xgboost using RMSE as the error measure. Mean and standard deviation are shown. RR SVM xgboost 59.10 (22.71) 36.23 (15.73) 33.26 (16.61) Plot level vs tree level Tab. 2: Predictive performance of xgboost at the plot and tree level. The performances estimates for "Plot level" correspond to the fold for which the respective plot was serving as the test set (block CV). For "Tree level" a five-fold five times repeated SpCV was used. Plot/Data Plot level Tree level (SpCV) Laukiz 1 22.03 19.18 Laukiz 2 51.75 17.24 Luiando 13.20 8.30 Oiartzun 32.97 14.40 14 / 28

  15. Results  Fig. 2: RMSE vs. mean point density and coefficient of variation (defoliation). 15 / 28

  16. Results  Variable importance Fig. 3: The 30 most important variables as estimated by the internal variable importance measure of the xgboost algorithm. The higher the score, the more important the feature. "bf2" notes that a buffer of 2 meter was used to extract the variable information to the tree observation. "NRI" means that a normalized ratio index with the subsequent bands was calculated. Features without "NRI" prefix are vegetation indices, e.g. "bf2_EVI". 16 / 28

  17. Results  Variable importance Acronym Name Formula Reference EVI Enhanced vegetation index 1 R 800 − R 670 2.5 × R 800 −(6× R 670 )−(7.5× R 475 )+1) R n 800 − R n GDVI Generalized DVI* 2 680 R n 800 + R n 680 D1 Derivative Index 3 D 730 D 706 mNDVI Normalized DVI* 4 R 800 − R 680 ( R 800 + R 680 −2× R 445 mSR Simple Ratio Index 4 R 800 − R 445 R 680 − R 445 1: Huete, Liu, Batchily, et al. (1997) 2: Wu, Niu, Tang, et al. (2008) 3: Zarco-Tejada, Pushnik, Dobrowski, et al. (2003) 4: Sims and Gamon (2002) * Difference Vegetation Index 17 / 28

  18. Results  Spatial prediction Fig. 4: Spatially predicted defoliation (in %) from xgboost of Laukiz 1 , Laukiz 2 , Luiando and Oiartzun . 18 / 28

  19. Results  Spatial prediction Fig. 5: Histograms of predicted defoliation (in %) from xgboost of Laukiz 1 , Laukiz 2 , Luiando and Oiartzun . 19 / 28

  20. Discussion  20 / 28

  21. Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) 21 / 28

  22. Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) RMSE vs. plot characteristics Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots 21 / 28

  23. Discussion  Derivation of indices 2 m buffer may not be optimal Varying number of contributing pixels to each index value (border effects) RMSE vs. plot characteristics Laukiz 2 did not follow the pattern of the other three plots Generalization not possible with only four plots Predictive performance xgboost showed the best performance  RR showed a suprisingly bad performance Random Forest was not used due to the high number of variables (very long runtime) 21 / 28

  24. Discussion  Predictive performance High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important 22 / 28

  25. Discussion  Predictive performance High performance variance between plots (more training data would increase performance) NRI do not seem to help much in this case, vegetation indices were most important Variable importance Internal model variable importance measures can always be questioned (not comparable) How to find a threshold which subset of variables to use Possible enhancments: Use two features sets: Only NRI, only vegetation indices Conduct a Principal components analysis (PCA) 22 / 28

  26. Discussion  Spatial prediction To-do: Prediction to Basque Country using Sentinel-2 data with the seven most important variables 23 / 28

  27. Appendix  24 / 28

  28. Appendix  App. 1: Spectral signatures (mean and standard deviation) of each plot. 25 / 28

  29. References  26 / 28

  30. References  Bischl, B, J. Richter, J. Bossek, et al. (2017). "mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions". In: ArXiv e-prints . arXiv: 1703.03373 [stat]. Brenning, A. (2012). "Spatial Cross-Validation and Bootstrap for the Assessment of Prediction Rules in Remote Sensing: The R Package Sperrorest". In: 2012 IEEE International Geoscience and Remote Sensing Symposium . R package version 2.1.0. IEEE. DOI: 10.1109/igarss.2012.6352393. Chen, T. and C. Guestrin (2016). "XGBoost: A Scalable Tree Boosting System". In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . KDD '16. 01130. New York, NY, USA: ACM, pp. 785-794. ISBN: 978-1-4503-4232-2. DOI: 10.1145/2939672.2939785. Friedman, J, T. Hastie and R. Tibshirani (2010). "Regularization Paths for Generalized Linear Models via Coordinate Descent". In: Journal of Statistical Software 33.1. 05097, pp. 1-22. Huete, A. R, H. Q. Liu, K. Batchily, et al. (1997). "A Comparison of Vegetation Indices over a Global Set of TM Images for EOS-MODIS". In: Remote Sensing of Environment 59.3. 01474, pp. 440-451. ISSN: 0034-4257. DOI: 10/bgtpgv. 27 / 28

Recommend


More recommend