intro on artificial intelligence from the perspective of
play

( ) Intro. on Artificial Intelligence from the perspective of - PowerPoint PPT Presentation

2018 ( ) Intro. on Artificial Intelligence from the perspective of probability theory luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net


  1. 人工智能引论 2018 罗智凌 人工智能引论 ( 五 ) Intro. on Artificial Intelligence from the perspective of probability theory 罗智凌 luozhiling@zju.edu.cn College of Computer Science Zhejiang University http://www.bruceluo.net

  2. 人工智能引论 2018 罗智凌 Comparison on two major branches Ground truth Observation Feature Name Feed-Forward NN (Stochastic) Recurrent NN Input Feature Observation Output Ground truth (Latent, Visible) variables Learning Supervised Learning Unsupervised Learning Model Discriminative Model Generative Model Strategy Loss on ground truth(diff or entropy) Loss on observation(energy) Algorithm Gradient Descent (Variational) EM, Sampling Examples Perception, MLP, CNN LSTM, Markov Field, RBM Hybrid DBN, GAN, pre-trained/two-phase learning, AutoEncoder

  3. 人工智能引论 2018 罗智凌 Feed-Forward NN Recurrent NN GAN Perceptron CNN LSTM Auto -Encoder FRCNN bi-LSTM DBN MLP RBM word2vec Hopfield Net Markov Net LDA Stochastic Model

  4. 人工智能引论 2018 罗智凌 OUTLINE • Recurrent NN – Long Short Term Memory • Stochastic Model in Neural Network – Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model • Hybrid Model – Deep Belief Network – AutoEncoder – Generative Adversarial Network

  5. 人工智能引论 2018 罗智凌 OUTLINE • Recurrent NN – Long Short Term Memory • Stochastic Model in Neural Network – Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model • Hybrid Model – Deep Belief Network – AutoEncoder – Generative Adversarial Network

  6. 人工智能引论 2018 罗智凌 From multilayer perceptron (MLP) to Recurrent Neural Network to LSTM l Multi-Layer Perceptron(MLP) is by nature a feedforward directed acyclic network. l An MLP consists of multiple layers and can map input data to output data via a set of nonlinear activation functions. MLP utilizes a supervised learning technique called backpropagation for training the network. l However, MLP will not be able to learn mapping functions that there are dependency between input data (i.e., sequential data) Mapping Input Output

  7. 人工智能引论 2018 罗智凌 From multilayer perceptron (MLP) to Recurrent Neural Network to LSTM Recurrent Neural Network: An RNN has recurrent connections (connections to previous time steps of the same layer). l RNN are powerful but can get extremely complicated. Computations derived from earlier input are fed back into the network, which gives RNN a kind of memory. l Standard RNNs suffer from both exploding and vanishing gradients due to their iterative nature. Embedding vector sequence input Mapping (ht) (x0…xt)

  8. 人工智能引论 2018 罗智凌 Recurrent Models of Visual Attention Volodymyr Mnih Nicolas Heess Alex Graves Koray Kavukcuoglu. Recurrent Models of Visual Attention.

  9. 人工智能引论 2018 罗智凌 l Long Short-Term Memory (LSTM) Model: l LSTM is an RNN devised to deal with exploding and vanishing gradient problems in RNN. l An LSTM hidden layer consists of a set of recurrently connected blocks, known as memory cells. l Each of memory cells is connected by three multiplicative units - the input, output and forget gates. l The input to the cells is multiplied by the activation of the input gate, the output to the net is multiplied by the output gate, and the previous cell values are multiplied by the forget gate. Sepp Hochreiter &J ű rgen Schmidhuber, Long short-term memory, Neural computation, Vol. 9(8), pp. 1735--1780, MIT Press, 1997

  10. 人工智能引论 2018 罗智凌 LSTM Cell state Hidden state Input 2 states, 3 gates, 4 layers Cell/Hidden state Forget/Write/Read gate 3 sigmoid/ 1 tanh perceptron

  11. 人工智能引论 2018 罗智凌 Cell state Cell state through time forget gate Hidden state at t-1 Input at t Sigmoid function Forget signal, 1 represents “completely keep this”, 0 represents “completely forget this”

  12. 人工智能引论 2018 罗智凌 Write signal, 1 represents “completely write this”, 0 represents “completely Input(Write) gate ignore this” Hidden state at t-1 Input at t Content to write

  13. 人工智能引论 2018 罗智凌 Update cell state Cell state at t-1 Write signal Write signal Content to write Updated cell state

  14. 人工智能引论 2018 罗智凌 Read signal, 1 represents “completely read this, 0 represents Output(Read) gate “completely ignore this Hidden state at t-1 Input at t Updated hidden state at t

  15. 人工智能引论 2018 罗智凌 Language Translation

  16. 人工智能引论 2018 罗智凌 Stock Prediction

  17. 人工智能引论 2018 罗智凌 OUTLINE • Recurrent NN – Long Short Term Memory • Stochastic Model in Neural Network – Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model • Hybrid Model – Deep Belief Network – AutoEncoder – Generative Adversarial Network

  18. � � 人工智能引论 2018 罗智凌 Stochastic NN • Energy based probability distribution on (latent, visible) variables: • where Z is called partition function ( 配分函数 ). • Loss function 𝑀 𝜄, 𝐸 = − ' x 5 ) ( ∑ log 𝑞( 0 1 ∈3 = ' 𝐹 x 5 ( ∑ − log 𝑎 0 1 ∈3

  19. 人工智能引论 2018 罗智凌 Hopfield Nets • A Hopfield net is composed of binary threshold units with recurrent connections between them.

  20. 人工智能引论 2018 罗智凌 The energy function • The global energy is the sum of many contributions. Each contribution depends on one connection weight and the binary states of two neurons: ∑ b i − ∑ E = − s i s i s j w ij i i < j • This simple quadratic energy function makes it possible for each unit to compute locally how it’s state affects the global energy: ∑ Energy gap = Δ E i = E ( s i = 0 ) − E ( s i = 1 ) b i + s j w ij = j

  21. 人工智能引论 2018 罗智凌 Settling to an energy minimum -4 • To find an energy minimum in this ? 0 1 net, start from a random state and then update units one at a time in random order. 3 2 3 3 – Update each unit to whichever of its two states gives the -1 -1 0 0 1 lowest global energy. – i.e. use binary threshold units. - E = goodness = 3

  22. 人工智能引论 2018 罗智凌 Settling to an energy minimum -4 • To find an energy minimum in this 0 1 net, start from a random state and then update units one at a time in random order. 3 2 3 3 – Update each unit to whichever of its two states gives the -1 -1 0 ? 0 1 lowest global energy. – i.e. use binary threshold units. - E = goodness = 3

  23. 人工智能引论 2018 罗智凌 Settling to an energy minimum -4 • To find an energy minimum in this 0 1 net, start from a random state and then update units one at a time in random order. 3 2 3 3 – Update each unit to whichever of its two states gives the -1 -1 1 0 ? 0 1 lowest global energy. – i.e. use binary threshold units. - E = goodness = 3 - E = goodness = 4

  24. 人工智能引论 2018 罗智凌 A deeper energy minimum • The net has two triangles in which the three units mostly support each other. -4 1 0 – Each triangle mostly hates the other triangle. • The triangle on the left differs from the 3 2 3 3 one on the right by having a weight of 2 where the other one has a weight of 3. -1 -1 – So turning on the units in the triangle 1 0 1 on the right gives the deepest minimum. - E = goodness = 5

  25. 人工智能引论 2018 罗智凌 A neat way to make use of this type of computation • Hopfield (1982) proposed that • Using energy minima to memories could be energy represent memories gives a minima of a neural net. content-addressable memory: – The binary threshold decision – An item can be accessed rule can then be used to by just knowing part of its “ clean up ” incomplete or content. corrupted memories. – It is robust against • The idea of memories as energy hardware damage. minima was proposed by I. A. – It’s like reconstructing a Richards in 1924 in “Principles of dinosaur from a few bones. Literary Criticism”.

  26. 人工智能引论 2018 罗智凌 OUTLINE • Recurrent NN – Long Short Term Memory • Stochastic Model in Neural Network – Hopfield Nets – Restricted Boltzmann Machine – Sleep/wake Model – Echo-State Model • Hybrid Model – Deep Belief Network – AutoEncoder – Generative Adversarial Network

  27. 人工智能引论 2018 罗智凌 Boltzmann Machine • 如图所示为一个玻尔兹曼机 (BM) ,其蓝色节点 为隐层 (hidden) ,白色节点为输入层 (visible) 。 • 与 Hopfield Net 相比,参数不固定,数据输入是 对 v 的观察。 • 与递归神经网络相比: – 1 、 RNN 本质是学习一个函数,因此有输入和输出层 的概念,而 BM 的用处在于学习一组数据的“内在表 示”,因此其没有输出层的概念。 – 2 、 RNN 各节点链接为有向环,而 BM 各节点连接成 无向完全图

Recommend


More recommend