characterizing uncertainty of the exposure point
play

Characterizing uncertainty of the exposure point concentration based - PowerPoint PPT Presentation

1 Characterizing uncertainty of the exposure point concentration based on left-censored data Niloofar Shoari PhD candidate Jean-Sbastien Dub, Ph.D Department of Construction Engineering 27 April 2016 Montreal, Canada 2 What are


  1. 1 Characterizing uncertainty of the exposure point concentration based on left-censored data Niloofar Shoari PhD candidate Jean-Sébastien Dubé, Ph.D Department of Construction Engineering 27 April 2016 Montreal, Canada

  2. 2 What are left-censored data? 20 Max Detection limit =1.3 As concentration data: <0.7, <0.7, <0.7, <0.9, <0.9, <1.3, 15 <1.3 , 1.8, 2.1, 2.2, … , 13, 14, 14, Frequency 14, 15, 18, 19, 21, 24 10 5 0 0 6 12 18 24 As Concentration (mg/kg)

  3. 3 Left-censored observation are real data <DL

  4. 4 Data uncertainty comes from a variety of sources: � Sampling uncertainty: e.g., inherent heterogeneity, improper collection and handling of samples (outside scope). � Analytical uncertainty (outside scope). � Data management uncertainty: uncertainty associated with data sets and the importance of statistics comes into play.

  5. 5 Various sources of uncertainty Data Sampling Analytical management Laboratory analysis Sampling www.greenskeeperlawncare.com Field sample Field subsample

  6. 6 Methods for estimating the EPC based on left-censored data � The Substitution method (e.g., DL/2) � The Kaplan-Meier method(K-M) o Non-parametric: No need to assume a parametric distribution for concentration data; � The Maximum likelihood method o Parametric: Assuming a parametric distribution; o lognormal, Weibull, and gamma � The Regression on order statistics (ROS) o Assuming a parametric distribution for uncensored data and predicting censored values; o rROS (lognormal), GROS (gamma).

  7. 7 Recommendations of previous simulation studies � Substitution provides biased estimates (Helsel, 2006). � KM performs well for <50% censoring (Antweiler, 2007). � MLE performs well when sample size is >50 (Helsel, 2012). o MLE(lognormal) has optimization problem in highly-skewed data with small to medium sample size (Shoari et al. 2015) � rROS and GROS seem to be robust to distribution (Helsel, 1986, Shoari et al. 2015) misspecification.

  8. 8 Basics of bootstrap Real world Bootstrap world

  9. 9 Data-based simulation used to quantify the uncertainty in the mean estimates Real world Bootstrap world µ x ( 1 ) ( 1 ) ( 1 ) ( 1 ) x , x ,..., x 1 2 n x ( 2 ) x ( 2 ) , x ( 2 ) ,..., x ( 2 ) 1 2 n Sample 1 x x , x ,..., x x ( 3 ) , x ( 3 ) ,..., x ( 3 ) x ( 3 ) 2 n 1 2 n x ( 1000 ) x ( 1000 ) , x ( 1000 ) ,..., x ( 1000 ) 1 2 n

  10. 10 Description of data � Concentrations of soil samples collected for characterization of a Brownfield site in Montreal. � Sample were collected between 1998 and 2009 from a total of 242 boreholes dispersed over the site. � Concentrations of 15 metals and 22 polycyclic aromatic hydrocarbons (PAH) � Concentration data are characterized by left-censored observations.

  11. 11 Scenario 1) � Large sample size Contaminant n Censoring % CV � Small censoring percent Cobalt 409 ¡ 31% ¡ 0.6 ¡ � Low skewed Comparable estimates of uncertainty.

  12. 12 Scenario 2) � Large sample size Contaminant n Censoring % CV � Medium censoring percent Benzo(a)pyrene 517 ¡ 51% ¡ 5.4 ¡ � Highly skewed Still similar estimates of uncertainty.

  13. 13 Scenario 3) � Large sample size Contaminant n Censoring % CV � Highly skewed Fluorene 517 ¡ 63% ¡ 5.6 ¡ � High censoring percent Inflated uncertainty of the mean estimates obtained by MLE (lognormal)

  14. 14 Scenario 4) Decrease in sample size leads to overestimation of uncertainty by MLE (lognormal).

  15. 15 Some examples: Mean ± uncertainty percent MLE MLE MLE (lognormal) (Weibull) (gamma) Contaminant KM rROS GROS Cobalt 8.23±6% 8.15±6% 8.22±7% 8.28±7% 8.32±7% 8.26±7% Arsenic 9.30±18% 8.05±13% 7.90±13% 7.88±24% 8.53±13% 7.20±16% Chrome 16.67±8% 17.04±10% 17.11±11% 16.77±10% 16.92±11% 16.98±11% Benzo.a.pyrene 1.08±49% 0.88±39% 1.25±48% 1.27±47% 1.26±47% 1.24±48% Fluorene 1.86±67% 0.93±44% 1.02±49% 1.04±48% 1.03±48% 1.01±49% Naphtalene 0.83±51% 0.74±63% 1.27±45% 1.29±63% 1.28±63% 1.26±64%

  16. 16 Lessons learned: � Some amount of uncertainty is caused by left-censored concentration data. � In the case of large concentration data, uncertainty of all methods is comparable. � Practitioners are cautioned about using the MLE method under lognormal assumption when -Concentration data are highly skewed; -Sample size is small; -Censoring percent is large.

  17. 17 Our recommendation � Appropriate use of the MLE method depends on the sample size and our knowledge about the distribution of concentration data. � The methods of rROS, GROS, and KM generally perform well because -robust to data skewness; -robust to sample size; -robust to censoring percent.

  18. 18 Reference � Antweiler, R.C. and Taylor, H.E., 2008. Evaluation of statistical treatments of left- censored environmental data using coincident uncensored data sets: I. Summary statistics. Environmental science & technology, 42(10):3732-3738. � Gilliom, R. J.; Helsel, D. R. 1986. Estimation of distributional parameters for censored trace level water quality data 1. Estimation techniques. Water Resour. Res. 22, 135-146. � Helsel, D. R. 2006 Fabricating data: How substituting values for nondetects can ruin results, and what can be done about it. Chemosphere, 65:2434 -2439 � Helsel, D. R. Statistics for censored environmental data using Minitab and R; John Wiley & Sons, 2012; Vol. 77. � Shoari N, Dubé J-S, Chenouri S. 2015. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification. Chemosphere 138: 599-608

Recommend


More recommend