( ) Intro. on Artificial Intelligence from the perspective of - PowerPoint PPT Presentation

人工智能引论 2018 罗智凌人工智能引论 ( 三 ) Intro. on Artificial Intelligence from the perspective of probability theory 罗智凌 luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net

人工智能引论 2018 罗智凌 OUTLINE • Strategies • Algorithm • Applications

人工智能引论 2018 罗智凌 Strategies • Loss in objective function • 0-1 loss • Quadratic loss • Absolute loss • Logarithmic loss (log-likelihood loss) – MLE – MAP

人工智能引论 2018 罗智凌 Generative/ Discriminative Model P5 P30 G Prob • Generating Procedure: Y Y Y 0.173 • P5 ~ b( 𝛽 ) Y Y N 0.075 • P30~ b( 𝛾 ) Y N Y 0.116 • 𝜄 ~ Multi(P5, P30, 𝛿 ) Y N N 0.121 N Y Y 0.075 • G~ b( 𝜄 ) N Y N 0.127 𝛿 N N Y 0.179 𝛽 N N N 0.133 P5 Generative model G 𝜄 𝑄 𝐻, 𝑄5, 𝑄30 𝛽, 𝛾, 𝛿) P30 𝛾 Discriminative model 𝑄(𝑄5, 𝑄30, 𝐻, 𝛽, 𝛾, 𝛿) 𝑄 𝐻 𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿)

� � � � 人工智能引论 2018 罗智凌 Maximum Likelihood Estimation • arg max 𝑄(𝐻|𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿) • 𝑄(𝐻|𝑄5, 𝑄30, 𝛽, 𝛾, 𝛿) = ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 • -> arg min –log ( ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 )

� � 人工智能引论 2018 罗智凌 Maximize A Posterior 𝑄 𝛽, 𝛾, 𝛿 𝑄5, 𝑄30, 𝐻) = 𝑄(𝑄5, 𝑄30, 𝐻, 𝛽, 𝛾, 𝛿) 𝑄(𝑄5, 𝑄30, 𝐻) ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 𝑅 𝛽 𝑅 𝛾 𝑅(𝛿) = 𝑄(𝑄5, 𝑄30, 𝐻) P 是 𝛽, 𝛾, 𝛿 的函数，可以记作 𝑔(𝛽, 𝛾, 𝛿) 或者 𝑔(𝛽, 𝛾, 𝛿; 𝑄5, 𝑄30, 𝐻)

人工智能引论 2018 罗智凌 Maximize A Posterior • Log-likelihood Loss: −log (𝑔(𝛽, 𝛾, 𝛿)) • Regularization (Optimal): 𝜇 (∥ 𝛽 ∥+∥ 𝛾 ∥+∥ 𝛿 ∥) • Loss function (objective function): 𝑚 = − log 𝑔 𝛽, 𝛾, 𝛿 + 𝜇 (∥ 𝛽 ∥+∥ 𝛾 ∥+∥ 𝛿 ∥) 𝛽 ∗ , 𝛾 ∗ , 𝛿 ∗ = arg 𝑛𝑗𝑜 𝑚 • Solve as 通过随机梯度下降、爬山等已有方法工具 (MATLAB, Python) 求解

� � � � 人工智能引论 2018 罗智凌 MLE vs MAP • MLE: • arg 𝑛𝑏𝑦 ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄 • MAP: • arg 𝑛𝑏𝑦 ∫ 𝑄 𝑄5 𝛽 𝑄 𝑄30 𝛾 𝑄(𝐻 𝜄 𝑄 𝜄 𝑄5, 𝑄30, 𝛿 𝑒𝜄𝑅 𝛽 𝑅 𝛾 𝑅(𝛿) The prior on parameters

人工智能引论 2018 罗智凌 Understand LDA with MLE

人工智能引论 2018 罗智凌 Generative vs Discriminative Models

人工智能引论 2018 罗智凌 OUTLINE • Strategies • Algorithm – Gradient Descent (GD) – EM algorithm – Sampling algorithms • Applications

人工智能引论 2018 罗智凌 Gradient Descent

人工智能引论 2018 罗智凌 Batch/Stochastic gradient

人工智能引论 2018 罗智凌 Advanced Varients • Momentum SGD • Adagrad – Big learning rate for low-frequent param, small for high-frequent one. • Adadelta – Adagrad 的改进，用 local 的梯度平方和替换了全局的梯度平方 • Adam – 与 Adagrad 相似，增加了梯度的二阶矩，更稳定

人工智能引论 2018 罗智凌 Expectation–Maximization algorithm Given the statistical model which generates a set X of observed data, a set of unobserved latent data or missing values Z , and a vector of unknown parameters θ , along with a likelihood function L(θ;X,Z)=p(X,Z|θ) , the maximum likelihood estimate (MLE) of the unknown parameters is determined by the marginal likelihood of the The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these two steps: • Expectation step (E step): Calculate the expected value of the log likelihood function, with respect to the conditional distribution of Z given X under the current estimate of the parameters θ(t) : • Maximization step (M step): Find the parameters that maximize this quantity:

人工智能引论 2018 罗智凌 Sampling • Conjugate distribution based sampling 1. The observation is a stochastic variable x in a distribution ∅ phi with parameters 𝜈 . 2. parameter 𝜈 has a known prior distribution f with hyper-parameter 𝜕 . 3. The pair of ∅ and f is in one of existing conjugate distributions . For example ∅ is normal distribution and its expectation f is also in the normal distribution. • 巧妙地根据条件概率函数选择先验函数，能使得后验与先验保持同样的函数形式。

人工智能引论 2018 罗智凌 Discrete distributions

人工智能引论 2018 罗智凌 Conjugate Priors

人工智能引论 2018 罗智凌 Conjugate priors

人工智能引论 2018 罗智凌 Gibbs Sampling

人工智能引论 2018 罗智凌 OUTLINE • About AI • Preliminaries about Bayesian • Generative/Discriminative Model • Applications – Markov Model – Markov Network – Neural Network

人工智能引论 2018 罗智凌 Markov Rule • A discrete-time Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that the probability of moving to the next state depends only on the present state and not on the previous states • First-order Markov and p-order Markov

人工智能引论 2018 罗智凌 Random Field • Markov Random Field • Gibbs Random Field • Conditional Random Field • Gaussian Random Field

人工智能引论 2018 罗智凌 Markov Network • Markov Chain 𝛽 𝛾 P5 P5 P5 P30 P30 P30 𝜄 𝜄 G G

人工智能引论 2018 罗智凌 Hidden Markov Model • Markov chain rule: • 语音识别 • 手势、字体识别 • 故障检测

人工智能引论 2018 罗智凌 Markov Network • Hidden Markov Model P5 P5 P5 P30 P30 P30 𝜄 𝜄 G G

人工智能引论 2018 罗智凌 Markov Random Field • 信息编码 • 人口模拟模型 P5 P5 P5 P30 P30 P30 G G G

人工智能引论 2018 罗智凌 Neural Network • Intent Variable -> Hidden Layer • 自动特征混合（非线性混合） 𝜄 • 分类 / 回归 P5 G 𝜄 P30 𝜄

人工智能引论 2018 罗智凌 Mixtures of Gaussians

人工智能引论 2018 Gaussian mixture distribution 罗智凌 • Definition: • Introduce a K-dimensional binary random variable z = ( z 1 , z 2 , …, z K ) T Latent variable • If , then • Equivalent formulation of the Gaussian mixture: responsibility

人工智能引论 2018 Gaussian mixture distribution 罗智凌 responsibility

人工智能引论 2018 Gaussian mixture distribution 罗智凌

人工智能引论 2018 The difficulty of estimating parameters in GMM by ML 罗智凌 • The log of the likelihood function of GMM: • Issue #1: singularities – Collapses onto a specific data point • Issue #2: identifiability – Total K! equivalent solutions • Issue #3: no closed form solution – The derivatives of the log likelihood are complex.

人工智能引论 2018 Expectation-Maximization algorithm for GMM 罗智凌 E Step Each iteration will increase the log likelihood function. M Step • Solve µ k : -1 Weighting factor • Solve Σ k : • Solve π k :

人工智能引论 2018 Expectation-Maximization algorithm for GMM 罗智凌

人工智能引论 2018 EM algorithm for GMM: experiment 罗智凌 • The Old Faithful data set:

人工智能引论 2018 EM algorithm for GMM: experiment 罗智凌 • The Old Faithful data set: Illustration of the EM algorithm using the Old Faithful set as used for the illustration of the K -means algorithm

人工智能引论 2018 罗智凌罗智凌 luozhiling@zju.edu.cn http://www.bruceluo.net

( ) Intro. on Artificial Intelligence from the perspective of - PowerPoint PPT Presentation

2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

1/29/10 CSE 3402: Intro to Artificial Intelligence CSE 3402: Intro to Artificial Intelligence

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht

Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Lecture 01 Part 01 Algorithms How do we turn it into something a computer Recall DSC

Gods Character W. Mark Lanier W. Mark Lanier Whats in a name? Commandment 3 Whats

Self-similar groups: old and new results Said Najati Sidki Universidade de Brasilia In 1998

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Monitoring Built-up areas using DMSP-OLS nighttime lights data: A study from Indo Gangetic Plain

( ) Intro. on Artificial Intelligence from the perspective of - PowerPoint PPT Presentation

2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

1/29/10 CSE 3402: Intro to Artificial Intelligence CSE 3402: Intro to Artificial Intelligence

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht

Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Lecture 01 Part 01 Algorithms How do we turn it into something a computer Recall DSC

Gods Character W. Mark Lanier W. Mark Lanier Whats in a name? Commandment 3 Whats

Self-similar groups: old and new results Said Najati Sidki Universidade de Brasilia In 1998

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

Monitoring Built-up areas using DMSP-OLS nighttime lights data: A study from Indo Gangetic Plain

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA