Fully Distributed EM for Very Large Datasets Jason Wolfe Aria - PowerPoint PPT Presentation

Fully Distributed EM for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley

Overview ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺍﻟﻮﻻﻳﺎﺕ ﺍﳌﺘﺤﺪﺓ US Hosts ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺗﺴﺘﻀﻴﻒ ﻣﺆﲤﺮ Middle ﺍﻟﺴﻼﻡ ﻓﻰ East Peace East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace ﺍﻟﺴﻼﻡ ﻓﻰ East Peace Task: unsupervised learning via EM ﺍﻟﺴﻼﻡ ﻓﻰ ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Conference ﺍﻟﺸﺮﻕ ﺍﻻﻭﺳﻂ Next Week Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ Next Week ﻓﻰ ﺍﻻﺳﺒﻮﻉ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ ﺍﻟﻘﺎﺩﻡ 244 parameters millions of Focus: models w/ many local parameters (relevant to few datums) 0 0 1 2 3 millions of data points useful Approach: fully distributed, localized EM work ⋆ parameter locality → less bandwidth communication overhead

Outline Running example: IBM Model 1 for word alignment Naive distributed EM Efficiently distributed EM

Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t  θ la � the   θ la � chair     θ la � table    θ = θ silla � the θ silla � chair      θ mesa � the     θ mesa � table

Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t la silla la mesa  θ la � the the chair the table   θ la � chair possible alignment arcs     θ la � table    θ = θ silla � the θ silla � chair      θ mesa � the     θ mesa � table

Word alignment for machine translation la silla la mesa Goal: parallel sentences → word-level translation model the chair the table Parameters θ s � t : corpus of parallel sentences probability that Spanish word s translates to English word t la silla la mesa  θ la � the = 1 . 0 the chair the table   = 0 . 0 θ la � chair possible alignment arcs     θ la � table = 0 . 0    θ = θ silla � the = 0 . 0 la silla la mesa = 1 . 0 θ silla � chair     = 0 . 0  θ mesa � the the chair the table     θ mesa � table = 1 . 0 unobserved true alignments

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels ? ? ? each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

IBM Model 1 for word alignment a Steve no le gustan las ferias grandes Steve does not like big ferris wheels each target word is generated by exactly one source word chosen u.a.r IBM Model 1: a simple generative model For each target position i , independently choose a source index a i u.a.r. choose a target word T i ∼ θ S ai � ·

EM algorithm for IBM Model 1 θ la � the =.33, θ la � chair =.33, θ ← some initial guess θ la � table =.33, θ silla � the =.5,...

EM algorithm for IBM Model 1 θ la � the =.33, θ la � chair =.33, θ ← some initial guess θ la � table =.33, θ silla � the =.5,... Iterate: la silla . 33 . 5 . 33+ . 5 = . 4 . 6= E-step: estimate alignment counts η 1 . 33+ . 5 the chair compute posteriors p ( a i | θ ) 1

Fully Distributed EM for Very Large Datasets Jason Wolfe Aria - PowerPoint PPT Presentation

Fully Distributed EM for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley Overview US Hosts US Hosts

MANAGING AND MANAGING AND PROCESSING LARGE PROCESSING LARGE DATASETS DATASETS Christian

Learning with Large Datasets L eon Bottou NEC Laboratories America Why Large-scale Datasets?

MANAGING AND MANAGING AND PROCESSING LARGE PROCESSING LARGE DATASETS DATASETS Christian

1 Examples The ETH-80 Dataset (Bastian Leibe and Bernt Schiele) The Caltech 101 average image

LARGE DATASETS rogier.kievit@mrc-cbu.cam.ac.uk/@rogierK Outline 1) What is big data? 2)

CERN, June 2008 large, reliable, and secure distributed online storage harness idle resources of

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

Re Resilient Distributed Datasets: A Fa Fault-To Tolerant Abstraction for In In-Me Memor

Data Explora/on Large and complex datasets are commonplace

MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline -

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets

Resilient Distributed Datasets Presented by Henggang Cui 15799b Talk 1 Why not MapReduce

SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW Yu Su, Gagan Agrawal,

Motivation Large-scale distributed systems becoming more common multiple datacenters, cloud

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing M.

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2019 ADMINISTRIVIA -

Distributed Submodular Maximization in Massive Datasets Huy L. Nguyen Joint work with Rafael

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2020 ADMINISTRIVIA -

Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University

MapReduce Simplified Data Processing on Large Clusters Dean J. and Ghemawat S. Google, 2008

of Large Relational Datasets with OCL-based Languages Dimitrios S. Kolovos Ran Wei Konstantinos

Visualizing Astronomy How do we learn stuff from large datasets? Jill P. Naiman NSF+ITC Fellow,

Fully Distributed EM for Very Large Datasets Jason Wolfe Aria - PowerPoint PPT Presentation

Fully Distributed EM for Very Large Datasets Jason Wolfe Aria Haghighi Dan Klein Computer Science Division UC Berkeley Overview US Hosts US Hosts

MANAGING AND MANAGING AND PROCESSING LARGE PROCESSING LARGE DATASETS DATASETS Christian

Learning with Large Datasets L eon Bottou NEC Laboratories America Why Large-scale Datasets?

MANAGING AND MANAGING AND PROCESSING LARGE PROCESSING LARGE DATASETS DATASETS Christian

1 Examples The ETH-80 Dataset (Bastian Leibe and Bernt Schiele) The Caltech 101 average image

LARGE DATASETS rogier.kievit@mrc-cbu.cam.ac.uk/@rogierK Outline 1) What is big data? 2)

CERN, June 2008 large, reliable, and secure distributed online storage harness idle resources of

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

Re Resilient Distributed Datasets: A Fa Fault-To Tolerant Abstraction for In In-Me Memor

Data Explora/on Large and complex datasets are commonplace

MapReduce &amp; Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline -

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets

Resilient Distributed Datasets Presented by Henggang Cui 15799b Talk 1 Why not MapReduce

SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW Yu Su*, Gagan Agrawal*,

Motivation Large-scale distributed systems becoming more common multiple datacenters, cloud

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing M.

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2019 ADMINISTRIVIA -

Distributed Submodular Maximization in Massive Datasets Huy L. Nguyen Joint work with Rafael

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2020 ADMINISTRIVIA -

Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University

MapReduce Simplified Data Processing on Large Clusters Dean J. and Ghemawat S. Google, 2008

of Large Relational Datasets with OCL-based Languages Dimitrios S. Kolovos Ran Wei Konstantinos

Visualizing Astronomy How do we learn stuff from large datasets? Jill P. Naiman NSF+ITC Fellow,

MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline -

SUPPORTING SQL QUERIES FOR SUBSETTING LARGE- SCALE DATASETS IN PARAVIEW Yu Su, Gagan Agrawal,