mixture density networks mixture density networks
play

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin - PowerPoint PPT Presentation

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that


  1. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin

  2. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA

  3. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer.

  4. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions.

  5. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.:

  6. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels,

  7. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words,

  8. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes,

  9. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes, robot commands,

  10. SO FAR; RNNS THAT MODEL CATEGORICAL DATA SO FAR; RNNS THAT MODEL CATEGORICAL DATA Remember that most RNNs (and most deep learning models) end with a so�max layer. This layer outputs a probability distribution for a set of categorical predictions. E.g.: image labels, letters, words, musical notes, robot commands, moves in chess.

  11. EXPRESSIVE DATA IS OFTEN CONTINUOUS EXPRESSIVE DATA IS OFTEN CONTINUOUS

  12. SO ARE BIO-SIGNALS SO ARE BIO-SIGNALS Image Credit: Wikimedia

  13. CATEGORICAL VS. CONTINUOUS MODELS CATEGORICAL VS. CONTINUOUS MODELS

  14. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION

  15. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution

  16. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters:

  17. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and

  18. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ )

  19. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ ) Probability Density Function:

  20. NORMAL (GAUSSIAN) DISTRIBUTION NORMAL (GAUSSIAN) DISTRIBUTION “Standard” probability distribution Has two parameters: mean ( μ ) and standard deviation ( σ ) Probability Density Function: (x− μ )2 1 N(x ∣ μ , σ 2) = e− 2 σ 2 √ 2π σ 2

  21. PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated?

  22. PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data.

  23. PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data. Just calculate μ and σ

  24. PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA PROBLEM: NORMAL DISTRIBUTION MIGHT NOT FIT DATA What if the data is complicated? It’s easy to “fit” a normal model to any data. Just calculate μ and σ But this might not fit the data well.

  25. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters:

  26. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component

  27. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component

  28. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve

  29. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve Probability Density Function:

  30. MIXTURE OF NORMALS MIXTURE OF NORMALS Three groups of parameters: means ( μ ): location of each component standard deviations ( σ ): width of each component Weight (π): height of each curve Probability Density Function: K πiN(x ∣ μ , σ 2) p(x) = ∑ i=1

  31. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  32. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  33. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  34. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  35. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] σ = [2, 3] In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  36. THIS SOLVES OUR PROBLEM: THIS SOLVES OUR PROBLEM: Returning to our modelling problem, let’s plot the PDF of a evenly-weighted mixture of the two sample normal models. We set: K = 2 π = [0.5, 0.5] μ = [ − 5, 5] σ = [2, 3] (bold used to indicate the vector of parameters for each component) In this case, I knew the right parameters, but normally you would have to estimate , or learn , these somehow…

  37. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS

  38. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data.

  39. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal”

  40. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions.

  41. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error).

  42. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error). Problem! This is equivalent to fitting to a single normal model! �

  43. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Neural networks used to model complicated real-valued data. i.e., data that might not be very “normal” Usual approach: use a neuron with linear activation to make predictions. Training function could be MSE (mean squared error). Problem! This is equivalent to fitting to a single normal model! � (See Bishop, C (1994) for proof and more details)

  44. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS

  45. MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Idea: output parameters of a mixture model instead!

Recommend


More recommend