Machine Learning Safety with Applications to the Climate Sciences Derek DeSantis , Phil Wolfram, Boian Alexandrov May 11, 2020 Part I - Machine Learning Safety and why you should care Recent Successes of Machine Learning/AI

  2. Machine Learning Safety? Challenges With Current Paradigm

  3. Examples • Explainable or transparent - interpretable decisions

  4. Examples • Explainable or transparent - interpretable decisions • Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure”

  5. Examples • Explainable or transparent - interpretable decisions • Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure” • Human heuristics and unknown assumptions - Loss functions and optimization schemes

  6. Examples • Explainable or transparent - interpretable decisions • Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure” • Human heuristics and unknown assumptions - Loss functions and optimization schemes • Alignment - Pursued actions not calibrated with designers (perhaps informally specified) objective

  7. Examples • Explainable or transparent - interpretable decisions • Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure” • Human heuristics and unknown assumptions - Loss functions and optimization schemes • Alignment - Pursued actions not calibrated with designers (perhaps informally specified) objective • Data - hidden structure, low signal to noise • Adversarial robustness - weakness to distribution shifts • ?...

  8. Part II - Applications to the Climate Sciences developing robust, interpretable clustering

  9. Background

  10. Background K¨ oppen-Geiger Model

  11. Figure 4: K¨ oppen-Geiger map of North America (Peel et. al.)

  12. Problem • Climate depends on more than temperature and precipitation.

  13. Problem • Climate depends on more than temperature and precipitation. • Can only resolve land.

  14. Problem • Climate depends on more than temperature and precipitation. • Can only resolve land. • Does not adapt to changing climate.

  15. Problem • Climate depends on more than temperature and precipitation. • Can only resolve land. • Does not adapt to changing climate. • The cut-offs in model are, to some extent, arbitrary.

  16. Problem • Climate depends on more than temperature and precipitation. • Can only resolve land. • Does not adapt to changing climate. • The cut-offs in model are, to some extent, arbitrary. • No universal agreement to how many classes there should be.

  17. Background Clustering

  18. • Many different methods for clustering

  19. • Many different methods for clustering • Given k ∈ N , K-means seeks to minimize inner cluster variance: k � � � x i − m j � 2 . j =1 x i ∈ U j

  20. Problem • Dependence on algorithm of choice and hyperparameters.

  21. Problem • Dependence on algorithm of choice and hyperparameters. Cluster 1 Consensus Cluster 2 Dataset Clustering Cluster n Figure 5: Many clusterings combined into a single consensus clustering .

  22. Problem • Dependence on algorithm of choice and hyperparameters. Cluster 1 Consensus Cluster 2 Dataset Clustering Cluster n Figure 5: Many clusterings combined into a single consensus clustering . • Clustering ill-posed - lack measurement of “trust”.

  23. Problem • Dependence on algorithm of choice and hyperparameters. Cluster 1 Consensus Cluster 2 Dataset Clustering Cluster n Figure 5: Many clusterings combined into a single consensus clustering . • Clustering ill-posed - lack measurement of “trust”. • Dependence on “hidden parameters” - scale of data .

  24. Background Proposed Solution

  25. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales.

  26. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on.

  27. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on. 3. Taking these scales, combine classifications to produce a fuzzy clustering that assess the trust at each point.

  28. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on. 3. Taking these scales, combine classifications to produce a fuzzy clustering that assess the trust at each point. CGC 1 Cluster 1 CGC 2 CGC L 1 CGC 1 Consensus Cluster 2 CGC 2 Dataset Clustering CGC L 2 CGC 1 Cluster n CGC 2 CGC L n

  29. Preliminary Tools

  30. Preliminary Tools Discrete Wavelet Transform and Mutual Information

  31. • The DWT splits a signal into high and low frequency • Low temporal signal captures climatology (seasons, years, decades), DWT Space while low spatial signal DWT Time captures regional DWT features(city, county, of Tensor state).

  32. • The DWT splits a signal into high and low frequency • Low temporal signal captures climatology (seasons, years, decades), DWT Space while low spatial signal DWT Time captures regional DWT features(city, county, of Tensor state). Definition Given partitions of data U = { U j } k j =1 , V = { V j } l j =1 , the Mutual Information NI ( U, V ) measures how knowledge of one clustering reduces our uncertainty of the other.

  33. Preliminary Tools L15 Gridded Climate Dataset - Livneh et. al.

  34. • Gridded climate data set of North America. • Grid cell is monthly data from 1950-2013, six kilometers across. • Available variables used: precipitation, maximum temperature, minimum temperature.

  35. Coarse-Grain Clustering (CGC)

  36. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on. 3. Taking these scales, combine classifications to produce a fuzzy clustering that assess the trust at each point. CGC 1 Cluster 1 CGC 2 CGC L 1 CGC 1 Consensus Cluster 2 CGC 2 Dataset Clustering CGC L 2 CGC 1 Cluster n CGC 2 CGC L n

  37. Coarse-Grain Clustering (CGC) The Algorithm

  38. 1

  39. 1 2 DWT DWT DWT

  40. 1 2 DWT 3 DWT Stack DWT

  41. 1 2 DWT 3 4 DWT Stack Vectorize DWT

  42. 1 2 DWT 3 4 5 DWT Stack Vectorize Cluster DWT

  43. 1 2 DWT 3 4 5 6 DWT Stack Vectorize Cluster Label DWT

  44. Coarse-Grain Clustering (CGC) Results - Effect of Coarse-Graining

  45. Figure 6: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (1 , 1)

  46. Figure 7: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (2 , 1)

  47. Figure 8: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (4 , 1)

  48. Figure 9: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (1 , 1)

  49. Figure 10: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (1 , 3)

  50. Figure 11: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (1 , 6)

  51. Figure 12: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (1 , 1)

  52. Figure 13: CGC: K-means k = 10, ( ℓ s , ℓ t ) = (4 , 6)

  53. Mutual Information Ensemble Reduce (MIER)

  54. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on. 3. Taking these scales, combine classifications to produce a fuzzy clustering that assess the trust at each point. CGC 1 Cluster 1 CGC 2 CGC L 1 CGC 1 Consensus Cluster 2 CGC 2 Dataset Clustering CGC L 2 CGC 1 Cluster n CGC 2 CGC L n

  55. Mutual Information Ensemble Reduce (MIER) The Algorithm

  56. 1

  57. 1 2

  58. 1 2 3 Graph Cut

  59. 1 2 3 4 5 Graph Cut + Find Representative

  60. Mutual Information Ensemble Reduce (MIER) Results - Example for K-means K=10

  61. Figure 14: Results from graph cut algorithm. The highlighted resolutions are the final ensemble. Vertical number = l s , horzontal bar = l t .

  62. (a) ( ℓ s , ℓ t ) = (2 , 1) (b) ( ℓ s , ℓ t ) = (2 , 4) (c) ( ℓ s , ℓ t ) = (3 , 5) (d) ( ℓ s , ℓ t ) = (4 , 4)

  63. Consensus Clustering and Trust Algorithm

  64. Solution 1. Leverage discrete wavelet transform to classify across a multitude of scales. 2. Use information theory to discover most important scales to classify on. 3. Taking these scales, combine classifications to produce a fuzzy clustering that assess the trust at each point. CGC 1 Cluster 1 CGC 2 CGC L 1 CGC 1 Consensus Cluster 2 CGC 2 Dataset Clustering CGC L 2 CGC 1 Cluster n CGC 2 CGC L n

  65. Consensus Clustering and Trust Algorithm The Algorithm

  66. 1

  67. 1 2 [ , , , ] [ , , , ] [ , , , ] Class Labels [ ] , , , [ , , , ] [ , , , ] [ , , , ]

  68. 1 2 [ , , , ] [ , , , ] 3 [ , , , ] Class Labels [ ] , , , = C 1 [ ] Signals , , , [ , , , ] = C 2 [ , , , ] [ , , , ] [ ] , , , = C k [ , , , ]


