COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference

Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data – fitting probability tables to observations (eg as a frequentist; a Bayesian would just use probabilistic inference to update prior to posterior) 2

Statistical Machine Learning (S2 2017) Deck 23 Where are we? • Representation of joint distributions * PGMs encode conditional independence • Independence, d-separation • Probabilistic inference * Computing other distributions from joint * Elimination, sampling algorithms • Statistical inference * Learn parameters from data 3

Statistical Machine Learning (S2 2017) Deck 23 Have PGM, Some observations, No tables… False ? False ? FG i HT i True ? True ? HT false true False ? HG i FA i FG f t f t True ? False ? ? ? ? True ? ? ? ? FA false true AS i HG f t f t False ? ? ? ? i =1.. n True ? ? ? ? 4

� � Statistical Machine Learning (S2 2017) Deck 23 Fully-observed case is “easy” • Max-Likelihood Estimator (MLE) says * If we observe all r.v.’s 𝒀 in a PGM HT i FG i independently 𝑜 times 𝒚 𝑗 HG i FA i * Then maximise the full joint AS i ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : arg max *∈, ∏ i =1.. n 3>? 4 • Decomposes easily, leads to counts-based estimates * Maximise log-likelihood instead; becomes sum of logs ∑ log 𝑞 𝑌 𝑘 = 𝑦𝑗𝑘|𝑌 6789:;< 4 = 𝑦 36789:;< 4 : *∈, ∑ arg max 3>? 4 * Big maximisation of all parameters together, decouples into small independent problems • Example is training a naïve Bayes classifier 5

Statistical Machine Learning (S2 2017) Deck 23 Example: Fully-observed case # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 𝒐 FG i HT i false ? false ? true ? # 𝒚 𝒋 |𝑮𝑯 𝒋 = 𝒖𝒔𝒗𝒇 true ? 𝒐 HT false true HG i false ? FA i FG f t f t true ? false ? ? ? ? true ? ? ? ? FA false true AS i HG f t f t false ? ? ? ? i =1.. n true ? ? ? ? # 𝒚 𝒋 |𝑰𝑯 𝒋 = 𝒖𝒔𝒗𝒇, 𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 # 𝒚 𝒋 |𝑰𝑼 𝒋 = 𝒈𝒃𝒎𝒕𝒇, 𝑮𝑯 𝒋 = 𝒈𝒃𝒎𝒕𝒇 6

� � Statistical Machine Learning (S2 2017) Deck 23 Presence of unobserved variables trickier • But most PGMs you’ll encounter will have latent, or unobserved, variables FG i HT i HG i FA i • What happens to the MLE? AS i * Maximise likelihood of observed data only i =1.. n * Marginalise full joint to get to desired “partial” joint ∏ 𝑞 𝑌 𝑘 = 𝑦 34 |𝑌 6789:;< 4 = 𝑦 36789:;< 4 : * arg max *∈, ∏ ∑ 3>? STUVWU 4 4 * This won’t decouple – oh-no’s !! 7

Statistical Machine Learning (S2 2017) Deck 23 Can we reduce partially-observed to fully? • Rough idea FG i HT i * If we had guesses for the missing variables HG i FA i * We could employ MLE on fully-observed data AS i i =1.. n • With a bit more thought, could alternate between * Updating missing data * Updating probability tables/parameters • This is the basis for training PGMs 8

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false ? F,…,T ? true ? true ? HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? ? ? ? ? true i =1.. n 9

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case ? false false 0.9 F,…,T ? true ? true 0.1 HT false true ? false ? ? FG f t f t true ? false ? ? ? ? ? ? ? ? FA false true true T,…,F HG f T f t false ? ? ? ? Observed marginal ? ? ? ? true i =1.. n 10

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T ? true 0.5 true 0.1 HT false true false 0.5 ? ? FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t false 0.5 0.5 0.5 0.5 Seed 0.5 0.5 0.5 0.5 true i =1.. n 11

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.5 false 0.9 F,…,T F,…,T true 0.5 true 0.1 HT false true false 0.5 F,…,F T,…,T FG f t f t true 0.5 false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 FA false true true T,…,F HG f t f t Missing data as false 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 true expectation i =1.. n 12

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 HT false true false 0.6 F,…,F T,…,T FG f t f t true 0.4 false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T MLE on fully- false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true observed i =1.. n 13

Statistical Machine Learning (S2 2017) Deck 23 Example: Partially-observed case false 0.7 false 0.9 F,…,T F,…,T true 0.3 true 0.1 Seed Do until “convergence” HT false true false 0.6 F,…,F T,…,T Fill missing as expectation FG f t f t true 0.4 MLE on fully-observed false 0.7 0.4 0.3 0.6 0.3 0.6 0.7 0.4 FA false true true T,…,F HG f t f T false 0.7 0.3 0.4 0.8 0.3 0.7 0.6 0.2 true i =1.. n 14

Statistical Machine Learning (S2 2017) Deck 23 Expectation-Maximisation Algorithm Seed parameters randomly E-step : Complete unobserved data as expectations (point estimates) posterior distributions (prob inference) M-step: Update parameters with MLE on the fully-observed data 15

Statistical Machine Learning (S2 2017) Deck 23 Déjà vu? Hard E-step • K-means clustering • EM learning * Randomly assign cluster * Randomly seed centres parameters * Repeat * Repeat • Assign points to nearest • Expectations for missing clusters variables • Update cluster centres • Update parameters via MLE Soft E-step • Assign distribution of point • Posteriors for missing belonging to each cluster variables given observed, (e.g., 10% C1 20% C2 70% C3) current parameters 16

Statistical Machine Learning (S2 2017) Deck 23 Summary • Statistical inference on PGMs * What is it and why do we care? * Straight MLE for fully-observed data * EM algorithm for mixed latent/observed data 17

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM Statistical Inference Statistical Machine Learning (S2 2017) Deck 23 Statistical inference on PGMs Learning from data fitting probability tables to

Lecture 1. Introduction. Probability Theory COMP90051 Machine Learning Sem2 2017 Lecturer:

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Game Theory: Lecture #10 Outline: Strategic form games Dominated strategies Examples

Flips, Arrangements and Tableaux Ron Adin and Yuval Roichman Bar-Ilan University radin, yuvalr

Logic as a Tool Chapter 1: Understanding Propositional Logic 1.2 Propositional logical

Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena Barba Department of

Free fields, Quivers and Riemann surfaces Sanjaye Ramgoolam Queen Mary, University of London 11

Machine Learning George Konidaris gdk@cs.duke.edu Spring 2016 Machine Learning Subfield of AI

RePast Tutorial IV Prof. Lars-Erik Cederman Center for Comparative and International Studies

Eyring-Kramers formula for Poincar e and logarithmic Sobolev inequalities Andr e