Comparison of sequential and parallel algorithms for word and - PowerPoint PPT Presentation

Comparison of sequential and parallel algorithms for word and context count Names: Eduardo Ferreira, Francieli Zanon, Aline Villavicencio Groups: Processamento de linguagem natural e Processamento paralelo e distribuido (UFRGS)

Motivation Parallelize one of the steps for Distributional Thesaurus creation Create faster Distributional Thesaurus Used in many NLP applications Machine Translation Question Answering Needs great amount of data to be built 2

Agenda ● Distributional Thesaurus Creation ● Parallel Version ● Results 3

Distributional Thesaurus Creation A thesaurus is a list of words associated by a specific characteristic. word synonyms abandon leave, desert, give up, surrender, ... abide tolerate, accept, endure, stand, ... 4

Distributional Thesaurus Creation Initial pre- Distributional processed text Thesaurus Word-context Association Association Word-context association Count measure similarity 5

Distributional Thesaurus Creation Initial pre- Chocolate is delicious. Distributional processed text We eat pizza. Thesaurus Chocolate is expensive. Word-context Association Association Word-context association Count measure similarity 6

Distributional Thesaurus Creation Target Context Chocolate Eat Chocolate is delicious. Initial pre- Distributional We eat pizza. processed text Thesaurus Chocolate Delicious Chocolate is expensive. Chocolate Expensive Chocolate Delicious Word-context Association Association Word-context association Count measure similarity 7

Distributional Thesaurus Creation Target Context Count Chocolate Eat 1 Chocolate is delicious. Initial pre- Distributional We eat pizza. processed text Thesaurus Chocolate Delicious 2 Chocolate is expensive. Chocolate Expensive 1 Word-context Association Association Word-context association Count measure similarity 8

Distributional Thesaurus Creation Delicious Eat Expensive Chocolate is delicious. Initial pre- Distributional We eat pizza. processed text Thesaurus Chocolate 7 3 5 Chocolate is expensive. Pizza 3 9 4 Word-context Association Association Word-context association Count measure similarity 9

Distributional Thesaurus Creation word1 word2 similarity chocolate pizza 0.4 Chocolate is delicious. Initial pre- Distributional We eat pizza. processed text Thesaurus chocolate delicious 0.8 Chocolate is expensive. pizza eat 0.9 Word-context Association Association Word-context association Count measure similarity 10

Parallel version ● Sequential process is too slow ● Fits the MapReduce paradigm ○ Map: input text divided in multiple parts ○ Reduce: results are grouped together 12

Parallel version Spark framework Scala Tests executed in Sagitaire cluster Grid 5000 up to 40 nodes used, each one with 2 cores. 13

Parallel version Chocolate Eat Target Context # Target Context Node 1 Chocolate Delicious Chocolate Eat Chocolate Eat 1 Chocolate Delicious Chocolate Expensive Node Chocolate Expensive 2 Chocolate Delicious 3 Chocolate Delicious Chocolate Delicious Chocolate Delicious Chocolate Expensive 2 Chocolate Delicious Node 3 Chocolate Expensive Chocolate Expensive 14

Results 68 KB sequential parallel 40 time (in s) 0.09 45.31 speedup 0.0019 eficiency 0.000024 16

Results 11 GB sequential parallel 10 parallel 20 parallel 40 time (in s) 14029.8 536.74 289.85 180.87 Std Deviation 1.056 1.46 3.3 speedup 26.13 48.40 77.56 eficiency 1.30 1.21 0.97 17

Results 18

Results 19

Results 11 GB parallel 10 parallel 20 parallel 40 time (in s) 1466.34 1499.45 1670.47 speedup 9.56 9.35 8.39 eficiency 0.47 0.23 0.10 20

Conclusions The goal of this work was to parallelize the word- context count. Spark reduced significantly the time required for getting word-context counts. Performance improvement for large corpora. 21

Future Work Test the parallelization using other forms of file distribution (HDFS). Integrate tuple counts with the other 2 steps: ● Association measure ● Word-context similarity 22

Comparison of sequential and parallel algorithms for word and - PowerPoint PPT Presentation

Comparison of sequential and parallel algorithms for word and context count Names: Eduardo Ferreira, Francieli Zanon, Aline Villavicencio Groups: Processamento de linguagem natural e Processamento paralelo e distribuido (UFRGS) Motivation

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Recap: Brents principle Sequential algorithms: time = work Parallel algorithms (PRAM):

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Fast Scalable Parallel Comparison Sort Fast, Scalable Parallel Comparison Sort On Hybrid Multicore

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Democratizing Machine Learning and Artificial Intelligence: Probabilistic Programming with Scala

Modifiers X-bar theory Modifiers (1) a. a large small shirt b. a small large shirt (2) a. a

Func%onal Probabilis%c Programming CUFP 2013 Avi Pfeffer Charles

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Aspect-Oriented Opinion Mining from User Reviews in Croatian Goran Glava, Damir Koren ci

Verifying the Lustre modular reset Timothy Bourke 1,2 Llio Brun 1,2 Marc Pouzet 3,2,1 1 Inria

Formal verification of a code generator for a modeling language: the Velus project Xavier Leroy

Virtual assistants and accessing data Alan Nichol Co-founder and CTO, Rasa DataCamp Building

Sambuz

Useful Links

Newsletter

Mail Us

Comparison of sequential and parallel algorithms for word and - PowerPoint PPT Presentation

Comparison of sequential and parallel algorithms for word and context count Names: Eduardo Ferreira, Francieli Zanon, Aline Villavicencio Groups: Processamento de linguagem natural e Processamento paralelo e distribuido (UFRGS) Motivation

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Recap: Brents principle Sequential algorithms: time = work Parallel algorithms (PRAM):

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Fast Scalable Parallel Comparison Sort Fast, Scalable Parallel Comparison Sort On Hybrid Multicore

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Sequential Algorithms Classical

Democratizing Machine Learning and Artificial Intelligence: Probabilistic Programming with Scala

Modifiers X-bar theory Modifiers (1) a. a large small shirt b. a small large shirt (2) a. a

Func%onal Probabilis%c Programming CUFP 2013 Avi Pfeffer Charles

Introd u ction to spaC y AD VAN C E D N L P W ITH SPAC Y Ines Montani spaC y core de v eloper

Aspect-Oriented Opinion Mining from User Reviews in Croatian Goran Glava, Damir Koren ci

Verifying the Lustre modular reset Timothy Bourke 1,2 Llio Brun 1,2 Marc Pouzet 3,2,1 1 Inria

Formal verification of a code generator for a modeling language: the Velus project Xavier Leroy

Virtual assistants and accessing data Alan Nichol Co-founder and CTO, Rasa DataCamp Building

Sambuz

Useful Links

Newsletter

Mail Us

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions