what uncertainty do we get
play

What uncertainty do we get? Zhenwen Dai 11 October 2019 Zhenwen - PowerPoint PPT Presentation

What uncertainty do we get? Zhenwen Dai 11 October 2019 Zhenwen Dai What uncertainty do we get? 11 October 2019 1 / 27 Probabilistic Models Many probabilistic models have been discussed. We are interested in probabilistic models because it


  1. What uncertainty do we get? Zhenwen Dai 11 October 2019 Zhenwen Dai What uncertainty do we get? 11 October 2019 1 / 27

  2. Probabilistic Models Many probabilistic models have been discussed. We are interested in probabilistic models because it provides how uncertain it is about its prediction. Uncertainty has been categorized into various names such as epistemic uncertainty, aleatoric uncertainty, model uncertainty, noise. What do people mean by these types of uncertainty? Zhenwen Dai What uncertainty do we get? 11 October 2019 2 / 27

  3. Uncertainty in Discriminative Model Regression as an example: y = f ( x ) + ǫ A simple example, Bayesian linear regression (BLR): y i = w ⊤ Φ( x i ) + ǫ i Two random variables: ǫ i ∼ N (0 , σ 2 ) w ∼ N (0 , I ) , Zhenwen Dai What uncertainty do we get? 11 October 2019 3 / 27

  4. Uncertainty in Discriminative Model By uncertainty, we usually mean how wide is the probabilistic distribution of the predicted variable. y ∗ ) 2 ] . For BLR, it refers to var ( y ∗ ) = E p ( y ∗ | x ∗ ) [( y ∗ − ¯ If we obtain maximum likelihood estimate (MLE) of w , ˆ w , the predictive distribution is w ⊤ Φ( x ∗ ) + ǫ ∗ . p ( y ∗ | x ∗ , ˆ w ) = ˆ If we do Bayesian inference over w , the predictive distribution is � p ( y ∗ | x ∗ ) = p ( y ∗ | x ∗ , w ) p ( w | x , y ) d w . Zhenwen Dai What uncertainty do we get? 11 October 2019 4 / 27

  5. Epistemic and Aleatoric Uncertainty - Aleatoric uncertainty Aleatoric uncertainty is also known as statistical uncertainty, and is representative of unknowns that differ each time we run the same experiment. - Epistemic uncertainty Epistemic uncertainty is also known as systematic uncertainty, and is due to things one could in principle know but doesn’t in practice. This may be because a measurement is not accurate, because the model neglects certain effects, or because particular data has been deliberately hidden. Zhenwen Dai What uncertainty do we get? 11 October 2019 5 / 27

  6. Epistemic and Aleatoric Uncertainty in BLR Use BLR as an example: y i = w ⊤ Φ( x i ) + ǫ i , ǫ i ∼ N (0 , σ 2 ) w ∼ N (0 , I ) , In the usual modeling scenario, ǫ corresponds to aleatoric uncertainty. Measured as y ∗ ) 2 ] = σ 2 . var ( y ∗ ) = E p ( y ∗ | x ∗ , ˆ w ) [( y ∗ − ¯ w corresponds to epistemic uncertainty. Measured as var ( f ∗ ) = E p ( f ∗ | x ∗ ) [( f ∗ − ¯ f ∗ ) 2 ] , where f ∗ = w ⊤ Φ( x ∗ ) . Zhenwen Dai What uncertainty do we get? 11 October 2019 6 / 27

  7. Separation of Uncertainty With a probabilistic model, what we care is the predictive distribution p ( y ∗ | x ∗ ) . The separation of epistemic and aleatoric uncertainty seems a bit artificial. Do we really need to separate them? Zhenwen Dai What uncertainty do we get? 11 October 2019 7 / 27

  8. Probability Calibration It is a common question in practice whether we should trust the predictive probability. What does it mean when a weather forecasting method predict 70% of probability of raining. It is an well understood question in frequentist statistics. Outputs Gap Error=30.6 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 ence Zhenwen Dai What uncertainty do we get? 11 October 2019 8 / 27

  9. Probability Calibration for Aleatoric and Epistemic Uncertainty Make sense for aleatoric uncertainty. It is i.i.d., ǫ 1 , . . . , ǫ N ∼ p ( ǫ ) . Probability calibration for epistemic uncertainty? Does the uncertainty from the exact Bayesian posterior warrant calibrated probability on output? How about the measure only happened once? How shall we give a prior distribution? Would uncertainty be calibrated in this case? Zhenwen Dai What uncertainty do we get? 11 October 2019 9 / 27

  10. Uncertainty in Decision Making Alternatively we may assess the quality of uncertainty by the performance of downstream tasks. Which uncertainty shall we use in Bayesian optimization, experimental design? Zhenwen Dai What uncertainty do we get? 11 October 2019 10 / 27

  11. Preferential Bayesian Optimization Many functions that we are interested in optimizing is hard to measure: ◮ user experience, e.g, UI design ◮ movie/music rating Human are much better at comparing two things, e.g., is this coffee better than the previous one? To search for the most preferred option via only pair-wise comparisons. Zhenwen Dai What uncertainty do we get? 11 October 2019 11 / 27

  12. Preference Function Preference function: p ( y = 1 | x, x ′ ) = π ( x, x ′ ) = σ ( g ( x ′ ) − g ( x )) . 1 X I π ( x,x ′ ) ≥ 0 . 5 d x ′ . � Copeland function: S ( x ) = Vol ( X ) The minimal of a Copeland function corresponds to the most preferred choice. Objective function Preference function 20 1 . 0 0.5 15 Global minimum 10 0 . 9 0 . 8 f(x) 5 0.5 0 . 8 0 0 . 7 − 5 0 . 6 0 . 6 − 10 Copeland and soft-Copeland functions 0 . 5 x’ 1 . 4 0 . 4 0 . 4 1 . 2 Copeland Score value 0 . 3 1 . 0 soft-Copeland 0 . 8 0 . 2 0 . 2 0 . 6 0 . 1 0.5 0 . 4 0 . 2 0 . 0 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x x Zhenwen Dai What uncertainty do we get? 11 October 2019 12 / 27

  13. Exploration p ( y | x, x ′ ) = π ( x, x ′ ) y (1 − π ( x, x ′ )) 1 − y , π ( x, x ′ ) = σ ( f ( x, x ′ )) . E [ y ] = π ( x, x ′ ) , var ( y ) = π ( x, x ′ )(1 − π ( x, x ′ )) Expectation of y ? and � ( f ? ) Variance of y ∗ Variance of � ( f ? ) 1 . 0 1 . 0 1 . 0 0 . 24 0 . 8 0 . 09 0 . 8 0 . 8 0 . 22 0 . 8 0 . 7 0 . 08 0 . 20 0 . 6 0 . 07 0 . 6 0 . 6 0 . 6 0 . 18 0 . 06 0 . 5 0 . 4 0 . 4 0 . 16 0 . 4 0 . 05 0 . 4 0 . 04 0 . 14 0 . 3 0 . 2 0 . 2 0 . 2 0 . 03 0 . 12 0 . 2 0 . 02 0 . 0 0 . 10 0 . 0 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Zhenwen Dai What uncertainty do we get? 11 October 2019 13 / 27

  14. Epistemic and aleatoric uncertainty are different. Exploration should done only with epistemic uncertainty. Zhenwen Dai What uncertainty do we get? 11 October 2019 14 / 27

  15. What about composite model? genotype epigenotype environment G E EG latent representation x 3 x 3 x 3 x 3 1 2 3 4 of disease stratification gene ex- survival x 2 x 2 x 2 x 2 y 6 y 1 pression 1 2 3 4 analysis clinical mea- x 1 x 1 x 1 x 1 x 1 y 4 y 5 y 2 y 3 surements 1 2 3 4 5 and treatment social net- clinical I 2 I 1 work, notes music biopsy X-ray data Zhenwen Dai What uncertainty do we get? 11 October 2019 15 / 27

  16. Disclaimer I don’t know how to categorize the uncertainty from a probabilistic generative model for unsupervised learning such as VAE, GPLVM. Zhenwen Dai What uncertainty do we get? 11 October 2019 16 / 27

  17. Separation of Uncertainty in Complex model We need a systematic approach to separate epistemic and aleatoric uncertainty. Let’s still focus on discriminative models y i = f ( x i ) + ǫ i Zhenwen Dai What uncertainty do we get? 11 October 2019 17 / 27

  18. Look back at BLR y i = w ⊤ Φ( x i ) + ǫ i , ǫ i ∼ N (0 , σ 2 ) w ∼ N (0 , I ) , Aleatoric uncertainty: Unknowns that differ each time we run the same experiment. Epistemic uncertainty: Things one could in principle know but doesn’t in practice. Zhenwen Dai What uncertainty do we get? 11 October 2019 18 / 27

  19. One way to classify Aleatoric uncertainty Unknowns that differ each time we run the same experiment. Independence among data points y i = ( x i , h i ) Epistemic uncertainty Things one could in principle know but doesn’t in practice. Global variable y i = ( x i , h ) Zhenwen Dai What uncertainty do we get? 11 October 2019 19 / 27

  20. Variables Shared by a Subset of Data Points Aleatoric uncertainty y i = ( x i , h i ) Epistemic uncertainty y i = ( x i , h ) What about something in between? y i = ( x i , h z ( i ) ) , z : { 1 , . . . , N } → { 1 , . . . , C } Zhenwen Dai What uncertainty do we get? 11 October 2019 20 / 27

  21. An example: Multi-output GP Also known as Intrinsic Coregionalization Each input location corresponds to C different output dimensions. f = ( f 11 , . . . , f 1 N , . . . , f C 1 , . . . , f CN ) ⊤ . B ∈ R C × C , K ∈ R N × N . f | X ∼ N (0 , B ⊗ K ) , Zhenwen Dai What uncertainty do we get? 11 October 2019 21 / 27

  22. Latent variable multi-output GP Assume B is a covariance matrix computed according to a kernel function k ( · , · ) over a set of variable h 1 , . . . , h C . h i is a latent variable, h i ∼ N (0 , I ) . 0 . 4 Task-0 Task 0 Task-1 Task 1 30 latent dim h 1 Task 2 Task-2 0 . 2 Task 3 Task-3 20 Task 4 Task-4 Task 5 Task-5 Task 6 f ( x ) 10 0 . 0 Task 7 Task-6 Task 8 Task-7 0 Task-8 − 0 . 2 − 10 − 20 − 0 . 4 − 0 . 4 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 latent dim h 0 x Zhenwen Dai What uncertainty do we get? 11 October 2019 22 / 27

  23. Latent variable multi-output GP 2 test train 0 − 2 GP-ind 2 0 − 2 LMC 2 0 − 2 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 1 . 2 LVMOGP Zhenwen Dai What uncertainty do we get? 11 October 2019 23 / 27

  24. Epistemic or aleatoric? For multi-task learning, one output correspond to a task. The uncertainty associated with h i is epistemic uncertainty of the task. What if only one observation can be collected for each task? It becomes aleatoric! A better way to see it may be epistemic within the group and aleatoric for other groups. Zhenwen Dai What uncertainty do we get? 11 October 2019 24 / 27

Recommend


More recommend