Belief Propagation Algorithm Interest Group presentation by Eli Chertkov
Inference Statistical inference is the determination of an underlying probability distribution from observed data.
Probabilistic Graphical Models π¦ 1 π π¦ 1 = π 1 π¦ 1
Probabilistic Graphical Models π¦ 1 π¦ 2 π π¦ 1 , π¦ 2 = π 1 π¦ 1 π 2 π¦ 2
Probabilistic Graphical Models Directed: Bayesian Network Undirected: Markov Random Field π¦ 1 π¦ 2 π¦ 1 π¦ 2 π π¦ 1 , π¦ 2 = π π¦ 2 |π¦ 1 π 1 π¦ 1 π π¦ 1 , π¦ 2 = π 12 π¦ 1 , π¦ 2
Probabilistic Graphical Models Directed: Bayesian Network Undirected: Markov Random Field π¦ 1 π¦ 3 π¦ 2 π¦ 4 π¦ 1 π¦ 3 π π¦ 1 , π¦ 2 , π¦ 3 , π¦ 4 π¦ 2 π¦ 4 = π π¦ 4 |π¦ 3 , π¦ 2 π(π¦ 3 |π¦ 2 , π¦ 1 )π 2 (π¦ 2 )π 1 π¦ 1 π π¦ 1 , π¦ 2 , π¦ 3 , π¦ 4 = π 43 π¦ 4 , π¦ 3 π 42 (π¦ 4 , π¦ 2 )π 32 (π¦ 3 , π¦ 2 )π 31 π¦ 3 , π¦ 1 π 2 π¦ 2 π 1 π¦ 1
Probabilistic Graphical Models Directed: Bayesian Network Undirected: Markov Random Field Artificial Neural Network Restricted Boltzmann Machine (Deep Learning) Ising Model Hidden Markov Model Source: Wikipedia
Factor Graphs Directed: Bayesian Network Undirected: Markov Random Field π¦ 1 π¦ 1 π¦ 3 π¦ 3 π¦ 2 π¦ 4 π¦ 2 π¦ 4 These probability distributions can both be represented in terms of factor graphs π¦ 1 The factors π 123 = π 123 π¦ 1 , π¦ 2 , π¦ 3 π 234 = π 234 (π¦ 2 , π¦ 3 , π¦ 4 ) π¦ 3 π 123 are chosen to match the original probability distributions. π¦ 2 π¦ 4 π 234 π π¦ 1 , π¦ 2 , π¦ 3 , π¦ 4 = π 123 π¦ 1 , π¦ 2 , π¦ 3 π 234 (π¦ 2 , π¦ 3 , π¦ 4 )
Belief Propagation Outline β’ The goal of BP is to compute the marginal probability distribution for a random variable π¦ π in a graphical model: π π¦ π = π(π¦ 1 , β¦ , π¦ π ) π¦ π \ π¦ π β’ The probability distribution of a graphical model can be represented as a factor graph so that π π¦ π = π(π¦ π , π¦ π π ) πβππ(π¦ π ) π¦ π \ π¦ π where π¦ π π is the subset of the variables involved in factor π . β’ By interchanging the product and sum, we can write π π¦ π = π πβπ¦ π (π¦ π ) πβππ(π¦ π ) where π πβπ¦ π π¦ π = π(π¦ π , π¦ π π ) is called a message . π¦ π π
Belief Propagation Message Passing BP is a message-passing algorithm. The idea is to pass information through your factor graph by locally updating the messages between nodes. Once the messages have converged, then you can efficiently evaluate the marginal distribution for each variable: π π¦ π = π πβπ¦ π (π¦ π ) πβππ(π¦ π ) There are two types of message updates: π π¦ π π¦ π π Factor node to variable node Variable node to factor node π π¦ π βπ π¦ π = π π β² βπ¦ π (π¦ π ) π πβπ¦ π π¦ π = π({π¦ π }, π¦ π ) π π¦ π βπ (π¦ π ) π β² βππ(π¦ π )\f {π¦ π βππ π \x π } π¦ π
Killer app: Error-correcting codes To prevent the degradation of a binary signal through a noisy channel, we encode our original signal s into a redundant one t . π π πΏ πΏ π β πΏ parity-check bits A theoretically useful encoding scheme is linear block coding, which relates the two signals by a (binary) linear transformation π = π― π π When the matrix π― πΌ is random and sparse, the encoding is called a low- density parity check (LDPC) code. Decoding the degraded signal r of a LDPC code, i.e., inferring the original signal s , is an NP-complete problem. Nonetheless, BP is efficient at providing an approximate solution.
Linear block code visualization Source: Information Theory, Inference, and Learning Algorithms
Linear block code as a graphical model π = π― π π π’ 1 1 1 π‘ 1 π’ 2 1 π― π = 1 1 1 1 1 π’ 3 1 1 π‘ 2 1 1 1 π’ 4 π‘ 3 π π‘ 1 , π‘ 2 , π‘ 3 , π‘ 4 , π’ 1 , β¦ , π’ 7 β π’ 5 π‘ π , π’ π β 0,1 are binary random variables π‘ 4 π’ 6 π’ 7
Linear block code as a graphical model Observed signal 0 When decoding a signal, we observe the transmitted bits π’ π 1 π‘ 1 and try to find the most likely source bits π‘ π . 0 π‘ 2 This means we want to maximize 1 π π‘ 1 , π‘ 2 , π‘ 3 , π‘ 4 |π’ 1 , β¦ , π’ 7 = 0101101 π‘ 3 1 Belief Propagation is an efficient way to compute the marginal probability distribution π(π‘ π ) of 0 π‘ 4 the source bits π‘ π . 1
My toy LDPC decoding example Encoding matrix = π― πΌ =
My toy LDPC decoding example Encoded signal Noisy transmitted signal Marginal probabilities Reconstructed signal Note: There is a very similar message-passing algorithm, called the max-product (or min- sum, or Viterbi) algorithm, which computes the maximum probability configuration of the probability distribution x β = argmax x P(x) , which might be better suited for this decoding task.
References Information Theory, Inference, and Learning Algorithms by David MacKay. Yedidia, J.S.; Freeman, W.T.; Weiss, Y., βUnderstanding Belief Propagation and Its Generalizationsβ, Exploring Artificial Intelligence in the New Millennium (2003) Chap. 8, pp. 239-269. Pattern Recognition and Machine Learning by Christopher Bishop.
Recommend
More recommend