Fully Distributed EM for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley
Overview ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺍﻟﺴﻼﻡ ﻓﻰ East Peace East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace Task: unsupervised learning via EM ﺍﻟﺴﻼﻡ ﻓﻰ ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Next Week Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ 244 parameters millions of Focus: models w/ many local parameters (relevant to few datums) 0 0 1 2 3 millions of data points useful Approach: fully distributed, localized EM work ⋆ parameter locality → less bandwidth communication overhead
Overview ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺍﻟﺴﻼﻡ ﻓﻰ East Peace East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace Task: unsupervised learning via EM ﺍﻟﺴﻼﻡ ﻓﻰ ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Next Week Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ 244 parameters millions of Focus: models w/ many local parameters (relevant to few datums) 0 0 1 2 3 millions of data points useful Approach: fully distributed, localized EM work ⋆ parameter locality → less bandwidth communication overhead
Overview ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺍﻟﺴﻼﻡ ﻓﻰ East Peace East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace Task: unsupervised learning via EM ﺍﻟﺴﻼﻡ ﻓﻰ ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Next Week Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ 244 parameters millions of Focus: models w/ many local parameters (relevant to few datums) 0 0 1 2 3 millions of data points useful Approach: fully distributed, localized EM work ⋆ parameter locality → less bandwidth communication overhead
Outline Running example: IBM Model 1 for word alignment Naive distributed EM Efficiently distributed EM
Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t θ la � the θ la � chair θ la � table θ = θ silla � the θ silla � chair θ mesa � the θ mesa � table
Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t la silla la mesa θ la � the the chair the table θ la � chair possible alignment arcs θ la � table θ = θ silla � the θ silla � chair θ mesa � the θ mesa � table
Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t la silla la mesa θ la � the = 1 . 0 the chair the table = 0 . 0 θ la � chair possible alignment arcs θ la � table = 0 . 0 θ = θ silla � the = 0 . 0 la silla la mesa = 1 . 0 θ silla � chair = 0 . 0 θ mesa � the the chair the table θ mesa � table = 1 . 0 unobserved true alignments
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·
EM algorithm for IBM Model 1 θ la � the =.33, θ la � chair =.33, θ ← some initial guess θ la � table =.33, θ silla � the =.5,...
EM algorithm for IBM Model 1 θ la � the =.33, θ la � chair =.33, θ ← some initial guess θ la � table =.33, θ silla � the =.5,... Iterate: la silla . 33 . 5 . 33+ . 5 = . 4 . 6= E-step: estimate alignment counts η 1 . 33+ . 5 the chair compute posteriors p ( a i | θ ) 1
Recommend
More recommend