comp90051 statistical machine learning
play

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data fitting probability tables to


  1. COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference

  2. Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data – fitting probability tables to observations (eg as a frequentist; a Bayesian would just use probabilistic inference to update prior to posterior) 2

  3. Statistical Machine Learning (S2 2017) Deck 23 Where are we? • Representation of joint distributions * PGMs encode conditional independence • Independence, d-separation • Probabilistic inference * Computing other distributions from joint * Elimination, sampling algorithms • Statistical inference * Learn parameters from data 3

  4. Statistical Machine Learning (S2 2017) Deck 23 Have PGM, Some observations, No tables… False ? False ? FG i HT i True ? True ? HT false true False ? HG i FA i FG f t f t True ? False ? ? ? ? True ? ? ? ? FA false true AS i HG f t f t False ? ? ? ? i =1.. n True ? ? ? ? 4

  5. � � Statistical Machine Learning (S2 2017) Deck 23 Fully-observed case is “easy” • Max-Likelihood Estimator (MLE) says * If we observe all r.v.’s 𝒀 in a PGM HT i FG i independently 𝑜 times 𝒚 𝑗 HG i FA i * Then maximise the full joint AS i ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : arg max *∈, ∏ i =1.. n 3>? 4 • Decomposes easily, leads to counts-based estimates * Maximise log-likelihood instead; becomes sum of logs ∑ log 𝑞 𝑌 𝑘 = 𝑦𝑗𝑘|𝑌 6789:;< 4 = 𝑦 36789:;< 4 : *∈, ∑ arg max 3>? 4 * Big maximisation of all parameters together, decouples into small independent problems • Example is training a naïve Bayes classifier 5

  6. Statistical Machine Learning (S2 2017) Deck 23 Example: Fully-observed case # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 𝒐 FG i HT i false ? false ? true ? # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒖𝒔𝒗𝒇 true ? 𝒐 HT false true HG i false ? FA i FG f t f t true ? false ? ? ? ? true ? ? ? ? FA false true AS i HG f t f t false ? ? ? ? i =1.. n true ? ? ? ? # 𝒚 𝒋 |𝑰𝑯 𝒋 = 𝒖𝒔𝒗𝒇, 𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 # 𝒚 𝒋 |𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 6

  7. � � Statistical Machine Learning (S2 2017) Deck 23 Presence of unobserved variables trickier • But most PGMs you’ll encounter will have latent, or unobserved, variables FG i HT i HG i FA i • What happens to the MLE? AS i * Maximise likelihood of observed data only i =1.. n * Marginalise full joint to get to desired “partial” joint ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : * arg max *∈, ∏ ∑ 3>? STUVWU 4 4 * This won’t decouple – oh-no’s !! 7

  8. Statistical Machine Learning (S2 2017) Deck 23 Can we reduce partially-observed to fully? • Rough idea FG i HT i * If we had guesses for the missing variables HG i FA i * We could employ MLE on fully-observed data AS i i =1.. n • With a bit more thought, could alternate between * Updating missing data * Updating probability tables/parameters • This is the basis for training PGMs 8

  9. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false ? F,…,T ? true ? true ? HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? ? ? ? ? true i =1.. n 9

  10. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false 0.9 F,…,T ? true ? true 0.1 HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? Observed marginal ? ? ? ? true i =1.. n 10

  11. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T ? true 0.5 true 0.1 HT false true false 0.5 ? ? FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t false 0.5 0.5 0.5 0.5 Seed 0.5 0.5 0.5 0.5 true i =1.. n 11

  12. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T F,…,T true 0.5 true 0.1 HT false true false 0.5 F,…,F T,…,T FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t Missing data as false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 true expectation i =1.. n 12

  13. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 HT false true false 0.6 F,…,F T,…,T FG f t f t true 0.4 false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T MLE on fully- false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true observed i =1.. n 13

  14. Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 Seed Do until “convergence” HT false true false 0.6 F,…,F T,…,T Fill missing as expectation FG f t f t true 0.4 MLE on fully-observed false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true i =1.. n 14

  15. Statistical Machine Learning (S2 2017) Deck 23 Expectation-Maximisation Algorithm Seed parameters randomly E-step : Complete unobserved data as expectations (point estimates) posterior distributions (prob inference) M-step: Update parameters with MLE on the fully-observed data 15

  16. Statistical Machine Learning (S2 2017) Deck 23 Déjà vu? Hard E-step • K-means clustering • EM learning * Randomly assign cluster * Randomly seed centres parameters * Repeat * Repeat • Assign points to nearest • Expectations for missing clusters variables • Update cluster centres • Update parameters via MLE Soft E-step • Assign distribution of point • Posteriors for missing belonging to each cluster variables given observed, (e.g., 10% C1 20% C2 70% C3) current parameters 16

  17. Statistical Machine Learning (S2 2017) Deck 23 Summary • Statistical inference on PGMs * What is it and why do we care? * Straight MLE for fully-observed data * EM algorithm for mixed latent/observed data 17

Recommend


More recommend