comp90051 statistical machine learning
play

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions


  1. COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM Probabilistic Inference

  2. Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference on PGMs Computing marginal and conditional distributions from the joint of a PGM using Bayes rule and marginalisation. This deck: how to do it efficiently. Based on Andrew Moore’s tutorial slides & Ben Rubinstein’s slides 2

  3. � Statistical Machine Learning (S2 2017) Deck 22 Two familiar examples • Naïve Bayes (frequentist/Bayesian) Y * Chooses most likely class given data * Pr 𝑍|𝑌 & , … , 𝑌 ) = +, -,. / ,…,. 0 +, -,. / ,…,. 0 = ∑ +, -23,. / ,…,. 0 +, . / ,…,. 0 �5 … X 1 X d • Data 𝑌|𝜄~𝑂 𝜄, 1 with prior 𝜄~𝑂 0,1 (Bayesian) * Given observation 𝑌 = 𝑦 update posterior 𝜄 * Pr 𝜄|𝑌 = +, <,. +, <,. = +, . ∑ +, <,. = X • Joint + Bayes rule + marginalisation à anything 3

  4. � � � � � � � � Statistical Machine Learning (S2 2017) Deck 22 Nuclear power plant Faulty High temp HT FG gauge • Alarm sounds ; meltdown?! Faulty Pr 𝐼𝑈 𝐵𝑇 = 𝑢 = +, CD, FG2H High gauge FA HG • alarm +,(FG2H) ∑ +, FG2H, KF, CL, KL, CD MN, ON, MP = +, FG2H, KF, CQ, KL, CD R ∑ Alarm sounds MN, ON, MP, OSR AS Numerator (denominator similar) • expanding out sums, joint summing once over 2 5 table = T T T Pr 𝐼𝑈 Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 Pr 𝐺𝐻 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 Pr 𝐺𝐵 KL CL KF distributing the sums as far down as possible summing over several smaller tables = Pr 𝐼𝑈 T Pr 𝐺𝐻 T Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 T Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 KL CL KF 4

  5. � � � � � � � � � Statistical Machine Learning (S2 2017) Deck 22 Nuclear power plant (cont.) FG HT = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 Pr 𝐵𝑇 = 𝑢|𝐺𝐵, 𝐼𝐻 HG FA KL CL KF eliminate AS : since AS observed, really a no-op AS HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 ∑ Pr 𝐺𝐵 𝑛 FG 𝐺𝐵, 𝐼𝐻 KL CL KF eliminate FA : multiplying 1x2 by 2x2 HG FA HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 ∑ Pr 𝐼𝐻|𝐼𝑈, 𝐺𝐻 𝑛 KF 𝐼𝐻 KL CL Multiplication eliminate HG : multiplying 2x2x2 by 2x1 HG of tables, followed by summing, is actually matrix multiplication HT FG = Pr 𝐼𝑈 ∑ Pr 𝐺𝐻 𝑛 CL 𝐼𝑈, 𝐺𝐻 KL HG eliminate FG : multiplying 1x2 by 2x2 FA f t f t f 1.0 0 𝑛 KF 𝐼𝐻 = X 0.6 0.4 FA = Pr 𝐼𝑈 𝑛 KL 𝐼𝑈 HT t 0.8 0.2 5

  6. � � Statistical Machine Learning (S2 2017) Deck 22 Elimination algorithm Green background = Slide just for fun! Eliminate (Graph 𝐻 , Evidence nodes 𝐹 , Query nodes 𝑅 ) Choose node ordering 𝐽 such that 𝑅 appears last 1. 2. Initialise empty list active initialise For each node 𝑌 𝑗 in 𝐻 3. Append Pr 𝑌 𝑗 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 𝑗 )) a) to active For each node 𝑌 𝑗 in 𝐹 4. evidence Append 𝜀(𝑌 𝑗 , 𝑦 𝑗 ) to active a) For each 𝑗 in 𝐽 5. potentials = Remove tables referencing 𝑌 𝑗 from active a) 𝑂 𝑗 = nodes other than 𝑌 𝑗 referenced by tables b) marginalise Table 𝜚 𝑗 (𝑌 𝑗 , 𝑌 d e ) = product of tables c) Table 𝑛 𝑗 𝑌 d e = ∑ 𝜚 𝑗 (𝑌 𝑗 , 𝑌 d e ) d) . e Append 𝑛 𝑗 (𝑌 d e ) to active e) normalise Return Pr (𝑌 𝑅 |𝑌 𝐹 = 𝑦 𝐹 ) = 𝜚 𝑅 (𝑌 𝑅 )/ ∑ 𝜚 𝑅 (𝑌 𝑅 ) 6. . g 6

  7. Statistical Machine Learning (S2 2017) Deck 22 Runtime of elimination algorithm HT FG HT FG FG FG HT HT HG FA HG HT FA HT FG HG HG FA AS AS PGM after successive eliminations “reconstructed” graph From process called moralisation • Each step of elimination * Removes a node * Connects node’s remaining neighbours à forms a clique in the “reconstructed” graph (cliques are exactly r.v.’s involved in each sum) • Time complexity exponential in largest clique • Different elimination orderings produce different cliques * Treewidth: minimum over orderings of the largest clique * Best possible time complexity is exponential in the treewidth 7

  8. Statistical Machine Learning (S2 2017) Deck 22 Probabilistic inference by simulation • Exact probabilistic inference can be expensive/impossible • Can we approximate numerically? • Idea: sampling methods * Cheaply sample from desired distribution * Approximate distribution by histogram of samples 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 8

  9. Statistical Machine Learning (S2 2017) Deck 22 Monte Carlo approx probabilistic inference • Algorithm: sample once from joint 1 2 1. Order nodes’ parents before children (topological order) 2. Repeat 3 a) For each node 𝑌 𝑗 4 i. Index into Pr (𝑌 𝑗 |𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 h )) with parents’ values ii. Sample X i from this distribution 5 b) Together 𝒀 = (𝑌 1 , … , 𝑌 𝑒 ) is a sample from the joint • Algorithm: sampling from Pr (𝑌 k |𝑌 l = 𝑦 𝐹 ) 1. Order nodes’ parents before children 2. Initialise set 𝑇 empty; Repeat 1. Sample 𝒀 from joint 2. If 𝑌 𝐹 = 𝑦 𝐹 then add 𝑌 𝑅 to 𝑇 3. Return: Histogram of 𝑇 , normalising counts via divide by |𝑇| • Sampling++: Importance weighting, Gibbs, Metropolis-Hastings 9

  10. Statistical Machine Learning (S2 2017) Deck 22 Alternate forms of probabilistic inference • Elimination algorithm produces single marginal • Sum-product algorithm on trees * 2x cost, supplies all marginals * Name: Marginalisation is just sum of product of tables * “Identical” variants: Max-product, for MAP estimation • In general these are message-passing algorithms * Can generalise beyond trees (beyond scope): junction tree algorithm, loopy belief propagation • Variational Bayes: approximation via optimisation 10

  11. Statistical Machine Learning (S2 2017) Deck 22 Summary • Probabilistic inference on PGMs * What is it and why do we care? * Elimination algorithm; complexity via cliques * Monte Carlo approaches as alternate to exact integration 11

Recommend


More recommend