COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference
Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data – fitting probability tables to observations (eg as a frequentist; a Bayesian would just use probabilistic inference to update prior to posterior) 2
Statistical Machine Learning (S2 2017) Deck 23 Where are we? • Representation of joint distributions * PGMs encode conditional independence • Independence, d-separation • Probabilistic inference * Computing other distributions from joint * Elimination, sampling algorithms • Statistical inference * Learn parameters from data 3
Statistical Machine Learning (S2 2017) Deck 23 Have PGM, Some observations, No tables… False ? False ? FG i HT i True ? True ? HT false true False ? HG i FA i FG f t f t True ? False ? ? ? ? True ? ? ? ? FA false true AS i HG f t f t False ? ? ? ? i =1.. n True ? ? ? ? 4
� � Statistical Machine Learning (S2 2017) Deck 23 Fully-observed case is “easy” • Max-Likelihood Estimator (MLE) says * If we observe all r.v.’s 𝒀 in a PGM HT i FG i independently 𝑜 times 𝒚 𝑗 HG i FA i * Then maximise the full joint AS i ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : arg max *∈, ∏ i =1.. n 3>? 4 • Decomposes easily, leads to counts-based estimates * Maximise log-likelihood instead; becomes sum of logs ∑ log 𝑞 𝑌 𝑘 = 𝑦𝑗𝑘|𝑌 6789:;< 4 = 𝑦 36789:;< 4 : *∈, ∑ arg max 3>? 4 * Big maximisation of all parameters together, decouples into small independent problems • Example is training a naïve Bayes classifier 5
Statistical Machine Learning (S2 2017) Deck 23 Example: Fully-observed case # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 𝒐 FG i HT i false ? false ? true ? # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒖𝒔𝒗𝒇 true ? 𝒐 HT false true HG i false ? FA i FG f t f t true ? false ? ? ? ? true ? ? ? ? FA false true AS i HG f t f t false ? ? ? ? i =1.. n true ? ? ? ? # 𝒚 𝒋 |𝑰𝑯 𝒋 = 𝒖𝒔𝒗𝒇, 𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 # 𝒚 𝒋 |𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 6
� � Statistical Machine Learning (S2 2017) Deck 23 Presence of unobserved variables trickier • But most PGMs you’ll encounter will have latent, or unobserved, variables FG i HT i HG i FA i • What happens to the MLE? AS i * Maximise likelihood of observed data only i =1.. n * Marginalise full joint to get to desired “partial” joint ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : * arg max *∈, ∏ ∑ 3>? STUVWU 4 4 * This won’t decouple – oh-no’s !! 7
Statistical Machine Learning (S2 2017) Deck 23 Can we reduce partially-observed to fully? • Rough idea FG i HT i * If we had guesses for the missing variables HG i FA i * We could employ MLE on fully-observed data AS i i =1.. n • With a bit more thought, could alternate between * Updating missing data * Updating probability tables/parameters • This is the basis for training PGMs 8
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false ? F,…,T ? true ? true ? HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? ? ? ? ? true i =1.. n 9
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false 0.9 F,…,T ? true ? true 0.1 HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? Observed marginal ? ? ? ? true i =1.. n 10
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T ? true 0.5 true 0.1 HT false true false 0.5 ? ? FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t false 0.5 0.5 0.5 0.5 Seed 0.5 0.5 0.5 0.5 true i =1.. n 11
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T F,…,T true 0.5 true 0.1 HT false true false 0.5 F,…,F T,…,T FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t Missing data as false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 true expectation i =1.. n 12
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 HT false true false 0.6 F,…,F T,…,T FG f t f t true 0.4 false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T MLE on fully- false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true observed i =1.. n 13
Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 Seed Do until “convergence” HT false true false 0.6 F,…,F T,…,T Fill missing as expectation FG f t f t true 0.4 MLE on fully-observed false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true i =1.. n 14
Statistical Machine Learning (S2 2017) Deck 23 Expectation-Maximisation Algorithm Seed parameters randomly E-step : Complete unobserved data as expectations (point estimates) posterior distributions (prob inference) M-step: Update parameters with MLE on the fully-observed data 15
Statistical Machine Learning (S2 2017) Deck 23 Déjà vu? Hard E-step • K-means clustering • EM learning * Randomly assign cluster * Randomly seed centres parameters * Repeat * Repeat • Assign points to nearest • Expectations for missing clusters variables • Update cluster centres • Update parameters via MLE Soft E-step • Assign distribution of point • Posteriors for missing belonging to each cluster variables given observed, (e.g., 10% C1 20% C2 70% C3) current parameters 16
Statistical Machine Learning (S2 2017) Deck 23 Summary • Statistical inference on PGMs * What is it and why do we care? * Straight MLE for fully-observed data * EM algorithm for mixed latent/observed data 17
Recommend
More recommend