intro on artificial intelligence from the perspective of
play

( ) Intro. on Artificial Intelligence from the perspective of - PowerPoint PPT Presentation

2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net


  1. 人工智能引论 2018 罗智凌 人工智能引论 ( 三 ) Intro. on Artificial Intelligence from the perspective of probability theory 罗智凌 luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net

  2. 人工智能引论 2018 罗智凌 OUTLINE • Strategies • Algorithm • Applications

  3. 人工智能引论 2018 罗智凌 Strategies • Loss in objective function • 0-1 loss • Quadratic loss • Absolute loss • Logarithmic loss (log-likelihood loss) – MLE – MAP

  4. 人工智能引论 2018 罗智凌 Generative/ Discriminative Model P5 P30 G Prob • Generating Procedure: Y Y Y 0.173 • P5 ~ b( 𝛽 ) Y Y N 0.075 • P30~ b( 𝛾 ) Y N Y 0.116 • 𝜄 ~ Multi(P5, P30, 𝛿 ) Y N N 0.121 N Y Y 0.075 • G~ b( 𝜄 ) N Y N 0.127 𝛿 N N Y 0.179 𝛽 N N N 0.133 P5 Generative model G 𝜄 𝑄 𝐻, 𝑄5, 𝑄30 𝛽, 𝛾, 𝛿) P30 𝛾 Discriminative model 𝑄(𝑄5, 𝑄30, 𝐻, 𝛽, 𝛾, 𝛿) 𝑄 𝐻 𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿)

  5. � � � � 人工智能引论 2018 罗智凌 Maximum Likelihood Estimation • arg max 𝑄(𝐻|𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿) • 𝑄(𝐻|𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿) = ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 • -> arg min –log ( ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 )

  6. � � 人工智能引论 2018 罗智凌 Maximize A Posterior 𝑄 𝛽, 𝛾, 𝛿 𝑄5, 𝑄30, 𝐻) = 𝑄(𝑄5, 𝑄30, 𝐻, 𝛽, 𝛾, 𝛿) 𝑄(𝑄5, 𝑄30, 𝐻) ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 𝑅 𝛽 𝑅 𝛾 𝑅(𝛿) = 𝑄(𝑄5, 𝑄30, 𝐻) P 是 𝛽, 𝛾, 𝛿 的函数,可以记作 𝑔(𝛽, 𝛾, 𝛿) 或者 𝑔(𝛽, 𝛾, 𝛿; 𝑄5, 𝑄30, 𝐻)

  7. 人工智能引论 2018 罗智凌 Maximize A Posterior • Log-likelihood Loss: −log (𝑔(𝛽, 𝛾, 𝛿)) • Regularization (Optimal): 𝜇 (∥ 𝛽 ∥+∥ 𝛾 ∥+∥ 𝛿 ∥) • Loss function (objective function): 𝑚 = − log 𝑔 𝛽, 𝛾, 𝛿 + 𝜇 (∥ 𝛽 ∥+∥ 𝛾 ∥+∥ 𝛿 ∥) 𝛽 ∗ , 𝛾 ∗ , 𝛿 ∗ = arg 𝑛𝑗𝑜 𝑚 • Solve as 通过随机梯度下降、爬山等已有方法工具 (MATLAB, Python) 求解

  8. � � � � 人工智能引论 2018 罗智凌 MLE vs MAP • MLE: • arg 𝑛𝑏𝑦 ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 • MAP: • arg 𝑛𝑏𝑦 ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄𝑅 𝛽 𝑅 𝛾 𝑅(𝛿) The prior on parameters

  9. 人工智能引论 2018 罗智凌 Understand LDA with MLE

  10. 人工智能引论 2018 罗智凌 Generative vs Discriminative Models

  11. 人工智能引论 2018 罗智凌 OUTLINE • Strategies • Algorithm – Gradient Descent (GD) – EM algorithm – Sampling algorithms • Applications

  12. 人工智能引论 2018 罗智凌 Gradient Descent

  13. 人工智能引论 2018 罗智凌 Batch/Stochastic gradient

  14. 人工智能引论 2018 罗智凌 Advanced Varients • Momentum SGD • Adagrad – Big learning rate for low-frequent param, small for high-frequent one. • Adadelta – Adagrad 的改进,用 local 的梯度平方和替换了全局的梯度平方 • Adam – 与 Adagrad 相似,增加了梯度的二阶矩,更稳定

  15. 人工智能引论 2018 罗智凌 Expectation–Maximization algorithm Given the statistical model which generates a set X of observed data, a set of unobserved latent data or missing values Z , and a vector of unknown parameters θ , along with a likelihood function L(θ;X,Z)=p(X,Z|θ) , the maximum likelihood estimate (MLE) of the unknown parameters is determined by the marginal likelihood of the The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these two steps: • Expectation step (E step): Calculate the expected value of the log likelihood function, with respect to the conditional distribution of Z given X under the current estimate of the parameters θ(t) : • Maximization step (M step): Find the parameters that maximize this quantity:

  16. 人工智能引论 2018 罗智凌 Sampling • Conjugate distribution based sampling 1. The observation is a stochastic variable x in a distribution ∅ phi with parameters 𝜈 . 2. parameter 𝜈 has a known prior distribution f with hyper-parameter 𝜕 . 3. The pair of ∅ and f is in one of existing conjugate distributions . For example ∅ is normal distribution and its expectation f is also in the normal distribution. • 巧妙地根据条件概率函数选择先验函数,能使得后验与先验保 持同样的函数形式。

  17. 人工智能引论 2018 罗智凌 Discrete distributions

  18. 人工智能引论 2018 罗智凌 Conjugate Priors

  19. 人工智能引论 2018 罗智凌 Conjugate priors

  20. 人工智能引论 2018 罗智凌 Gibbs Sampling

  21. 人工智能引论 2018 罗智凌 OUTLINE • About AI • Preliminaries about Bayesian • Generative/Discriminative Model • Applications – Markov Model – Markov Network – Neural Network

  22. 人工智能引论 2018 罗智凌 Markov Rule • A discrete-time Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states • First-order Markov and p-order Markov

  23. 人工智能引论 2018 罗智凌 Random Field • Markov Random Field • Gibbs Random Field • Conditional Random Field • Gaussian Random Field

  24. 人工智能引论 2018 罗智凌 Markov Network • Markov Chain 𝛽 𝛾 P5 P5 P5 P30 P30 P30 𝜄 𝜄 G G

  25. 人工智能引论 2018 罗智凌 Hidden Markov Model • Markov chain rule: • 语音识别 • 手势、字体识别 • 故障检测

  26. 人工智能引论 2018 罗智凌 Markov Network • Hidden Markov Model P5 P5 P5 P30 P30 P30 𝜄 𝜄 G G

  27. 人工智能引论 2018 罗智凌 Markov Random Field • 信息编码 • 人口模拟模型 P5 P5 P5 P30 P30 P30 G G G

  28. 人工智能引论 2018 罗智凌 Neural Network • Intent Variable -> Hidden Layer • 自动特征混合(非线性混合) 𝜄 • 分类 / 回归 P5 G 𝜄 P30 𝜄

  29. 人工智能引论 2018 罗智凌 Mixtures of Gaussians

  30. 人工智能引论 2018 Gaussian mixture distribution 罗智凌 • Definition: • Introduce a K-dimensional binary random variable z = ( z 1 , z 2 , …, z K ) T Latent variable • If , then • Equivalent formulation of the Gaussian mixture: responsibility

  31. 人工智能引论 2018 Gaussian mixture distribution 罗智凌 responsibility

  32. 人工智能引论 2018 Gaussian mixture distribution 罗智凌

  33. 人工智能引论 2018 The difficulty of estimating parameters in GMM by ML 罗智凌 • The log of the likelihood function of GMM: • Issue #1: singularities – Collapses onto a specific data point • Issue #2: identifiability – Total K! equivalent solutions • Issue #3: no closed form solution – The derivatives of the log likelihood are complex.

  34. 人工智能引论 2018 Expectation-Maximization algorithm for GMM 罗智凌 E Step Each iteration will increase the log likelihood function. M Step • Solve µ k : -1 Weighting factor • Solve Σ k : • Solve π k :

  35. 人工智能引论 2018 Expectation-Maximization algorithm for GMM 罗智凌

  36. 人工智能引论 2018 EM algorithm for GMM: experiment 罗智凌 • The Old Faithful data set:

  37. 人工智能引论 2018 EM algorithm for GMM: experiment 罗智凌 • The Old Faithful data set: Illustration of the EM algorithm using the Old Faithful set as used for the illustration of the K -means algorithm

  38. 人工智能引论 2018 罗智凌 罗智凌 luozhiling@zju.edu.cn http://www.bruceluo.net

Recommend


More recommend