Sec$on 4: Parallel Algorithms Michelle Ku8el - PowerPoint PPT Presentation

Sec$on ¡4: ¡Parallel ¡Algorithms ¡ Michelle ¡Ku8el ¡ mku8el@cs.uct.ac.za ¡

The ¡DAG, ¡or ¡“cost ¡graph” ¡ • A ¡program ¡execu$on ¡using ¡ fork ¡and ¡ join ¡can ¡ be ¡seen ¡as ¡a ¡DAG ¡(directed ¡acyclic ¡graph) ¡ – Nodes: ¡Pieces ¡of ¡work ¡ ¡ – Edges: ¡Source ¡must ¡finish ¡before ¡des$na$on ¡starts ¡ • A ¡ fork ¡“ends ¡a ¡node” ¡and ¡makes ¡two ¡ outgoing ¡edges ¡ • New ¡thread ¡ • Con$nua$on ¡of ¡current ¡thread ¡ • A ¡ join ¡“ends ¡a ¡node” ¡and ¡makes ¡a ¡ node ¡with ¡two ¡incoming ¡edges ¡ • Node ¡just ¡ended ¡ • Last ¡node ¡of ¡thread ¡joined ¡on ¡ slide ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 2 ¡

The ¡DAG, ¡or ¡“cost ¡graph” ¡ • work ¡– ¡number ¡of ¡nodes ¡ • span ¡– ¡length ¡of ¡the ¡longest ¡path ¡ – cri$cal ¡path ¡ Checkpoint: ¡ What ¡is ¡the ¡span ¡of ¡this ¡DAG? ¡ What ¡is ¡the ¡work? ¡ slide ¡from: ¡Sophomoric ¡Parallelism ¡and ¡Concurrency, ¡Lecture ¡2 ¡ 3 ¡

Checkpoint ¡ axb ¡+ ¡cxd ¡ • Write ¡a ¡DAG ¡to ¡show ¡the ¡the ¡ work ¡and ¡ span ¡of ¡ this ¡expression ¡ axb ¡ cxd ¡ • the ¡set ¡of ¡instruc$ons ¡forms ¡ the ¡ver$ces ¡of ¡the ¡dag ¡ • ¡the ¡graph ¡edges ¡indicate ¡ dependences ¡between ¡ + ¡ instruc$ons. ¡ ¡ • We ¡say ¡that ¡an ¡instruc$on ¡x ¡ precedes ¡an ¡instruc-on ¡y ¡if ¡x ¡ must ¡complete ¡before ¡y ¡can ¡ begin. ¡ ¡

DAG ¡for ¡an ¡embarrassingly ¡parallel ¡ algorithm ¡ y i = f i ( x i )

DAG ¡for ¡an ¡embarrassingly ¡parallel ¡ algorithm ¡ or, ¡indeed: ¡ y i = f i ( x i )

Embarrassingly ¡parallel ¡examples ¡ Ideal ¡computa2on ¡ -‑ ¡ ¡a ¡computa$on ¡that ¡can ¡be ¡divided ¡ into ¡a ¡number ¡of ¡completely ¡separate ¡tasks, ¡each ¡of ¡ which ¡can ¡be ¡executed ¡by ¡a ¡single ¡processor ¡ No ¡special ¡algorithms ¡or ¡techniques ¡required ¡to ¡get ¡a ¡ workable ¡solu$on ¡e.g. ¡ • element-‑wise ¡linear ¡algebra: ¡ – addi$on, ¡scalar ¡mul$plica$on ¡etc ¡ • Image ¡processing ¡ – shi], ¡rotate, ¡clip, ¡scale ¡ • Monte ¡Carlo ¡simula$ons ¡ • encryp$on, ¡compression ¡

Image ¡Processing ¡ • Low-‑level ¡image ¡processing ¡uses ¡the ¡individual ¡pixel ¡ values ¡to ¡modify ¡the ¡image ¡in ¡some ¡way. ¡ ¡ ¡ • Image ¡processing ¡opera$ons ¡can ¡be ¡divided ¡into: ¡ – point ¡processing ¡– ¡output ¡produced ¡based ¡on ¡value ¡of ¡single ¡ pixel ¡ • well ¡known ¡Mandelbrot ¡set ¡ – local ¡opera$ons ¡– ¡produce ¡output ¡based ¡on ¡a ¡group ¡of ¡ neighbouring ¡pixels ¡ – global ¡opera$ons ¡– ¡produce ¡output ¡based ¡on ¡all ¡the ¡pixels ¡of ¡ the ¡image ¡ • Point ¡processing ¡opera$ons ¡are ¡embarrassingly ¡parallel ¡ (local ¡opera$ons ¡are ¡o]en ¡highly ¡parallelizable) ¡ ¡

Monte ¡Carlo ¡Methods ¡ • Basis ¡of ¡Monte ¡Carlo ¡methods ¡is ¡the ¡use ¡of ¡random ¡ selec$ons ¡in ¡calcula$ons ¡that ¡lead ¡to ¡the ¡solu$on ¡of ¡ numerical ¡and ¡physical ¡problems ¡e.g. ¡ – brownian ¡mo$on ¡ – molecular ¡modelling ¡ – forecas$ng ¡the ¡stock ¡market ¡ • Each ¡calcula$on ¡is ¡independent ¡of ¡the ¡others ¡and ¡hence ¡ amenable ¡to ¡embarrassingly ¡parallel ¡methods ¡

Trivial ¡Monte ¡Carlo ¡Integra$on ¡: ¡ finding ¡value ¡of ¡π ¡ • Monte ¡Carlo ¡integra$on ¡ – Compute ¡r ¡by ¡genera$ng ¡ random ¡points ¡in ¡a ¡square ¡of ¡side ¡ 2 ¡and ¡coun$ng ¡how ¡many ¡of ¡them ¡are ¡in ¡the ¡circle ¡with ¡ radius ¡1 ¡( x 2 +y 2 <1; ¡ π=4* ra2o) ¡. ¡ Area ¡of ¡square=4 ¡ 2 ¡ Area= ¡π ¡ 2 ¡

Monte ¡Carlo ¡Integra$on ¡: ¡finding ¡ value ¡of ¡π ¡ 0.0001 ¡ 0.00001 ¡ 0.001 ¡ solu$on ¡visualiza$on ¡

Monte ¡Carlo ¡Integra$on ¡ • Monte ¡Carlo ¡integra$on ¡can ¡also ¡be ¡used ¡to ¡calculate ¡ ¡ – the ¡area ¡of ¡any ¡shape ¡within ¡a ¡known ¡bound ¡area ¡ – any ¡area ¡under ¡a ¡curve ¡ – any ¡definite ¡integral ¡ • Widely ¡applicable ¡brute ¡force ¡solu$on. ¡ ¡ – Typically, ¡accuracy ¡is ¡propor$onal ¡to ¡square ¡root ¡of ¡number ¡of ¡ repe$$ons. ¡ • Unfortunately, ¡Monte ¡Carlo ¡integra$on ¡is ¡very ¡c omputa$onally ¡ intensive, ¡so ¡used ¡when ¡other ¡techniques ¡fail. ¡ ¡ • ¡also ¡requires ¡the ¡maximum ¡and ¡minimum ¡of ¡any ¡func$on ¡within ¡ the ¡region ¡of ¡interest . ¡ ¡

Note: ¡Parallel ¡Random ¡Number ¡ Genera$on ¡ • for ¡successful ¡Monte ¡Carlo ¡simula$ons, ¡the ¡ random ¡numbers ¡must ¡be ¡independent ¡of ¡ each ¡other ¡ • Developing ¡random ¡number ¡generator ¡ algorithms ¡and ¡implementa$ons ¡that ¡are ¡fast, ¡ easy ¡to ¡use, ¡and ¡give ¡good ¡quality ¡pseudo-‑ random ¡numbers ¡is ¡a ¡challenging ¡problem. ¡ • ¡Developing ¡parallel ¡implementa$ons ¡is ¡even ¡ more ¡difficult. ¡ ¡

Requirements ¡for ¡a ¡Parallel ¡Generator ¡ • For ¡random ¡number ¡generators ¡on ¡parallel ¡computers, ¡it ¡is ¡ vital ¡that ¡there ¡are ¡no ¡correla$ons ¡between ¡the ¡random ¡ number ¡streams ¡on ¡different ¡processors. ¡ ¡ – e.g. ¡ ¡don't ¡want ¡one ¡processor ¡repea$ng ¡part ¡of ¡another ¡ processor’s ¡sequence. ¡ ¡ – could ¡occur ¡if ¡we ¡just ¡use ¡the ¡naive ¡method ¡of ¡running ¡a ¡RNG ¡on ¡ each ¡different ¡processor ¡and ¡just ¡giving ¡randomly ¡chosen ¡seeds ¡to ¡ each ¡processor. ¡ ¡ • In ¡many ¡applica$ons ¡we ¡also ¡need ¡to ¡ensure ¡that ¡we ¡get ¡ the ¡same ¡results ¡for ¡any ¡number ¡of ¡processors. ¡ ¡ ¡

Parallel ¡Random ¡Numbers ¡ • three ¡general ¡approaches ¡to ¡the ¡genera$on ¡of ¡ ¡ ¡ random ¡numbers ¡on ¡parallel ¡computers: ¡ ¡ – centralized ¡approach ¡ • a ¡sequen$al ¡generator ¡is ¡encapsulated ¡in ¡a ¡task ¡from ¡ which ¡other ¡tasks ¡request ¡random ¡numbers. ¡This ¡avoids ¡ the ¡problem ¡of ¡genera$ng ¡mul$ple ¡independent ¡ random ¡sequences, ¡but ¡is ¡unlikely ¡to ¡provide ¡good ¡ performance. ¡Furthermore, ¡it ¡makes ¡reproducibility ¡ hard ¡to ¡achieve: ¡the ¡response ¡to ¡a ¡request ¡depends ¡on ¡ when ¡it ¡arrives ¡at ¡the ¡generator, ¡and ¡hence ¡the ¡result ¡ computed ¡by ¡a ¡program ¡can ¡vary ¡from ¡one ¡run ¡to ¡the ¡ next ¡

Parallel ¡Random ¡Numbers ¡ – replicated ¡approach: ¡ • ¡ mul$ple ¡instances ¡of ¡the ¡same ¡ ¡ ¡generator ¡are ¡created ¡ (for ¡example, ¡one ¡per ¡task). ¡ ¡ • Each ¡generator ¡uses ¡either ¡the ¡same ¡seed ¡or ¡a ¡unique ¡ seed, ¡derived, ¡for ¡example, ¡from ¡a ¡task ¡iden$fier. ¡ ¡ • Clearly, ¡sequences ¡generated ¡in ¡this ¡fashion ¡are ¡not ¡ guaranteed ¡to ¡be ¡independent ¡and, ¡indeed, ¡can ¡suffer ¡ from ¡serious ¡correla$on ¡problems. ¡However, ¡the ¡ approach ¡has ¡the ¡advantages ¡of ¡efficiency ¡and ¡ease ¡of ¡ implementa$on ¡and ¡should ¡be ¡used ¡when ¡appropriate . ¡

Parallel ¡Random ¡Numbers ¡ – distributed ¡approach: ¡ – ¡responsibility ¡for ¡genera$ng ¡a ¡single ¡sequence ¡is ¡ par$$oned ¡among ¡many ¡generators, ¡which ¡can ¡ then ¡ ¡ ¡be ¡parceled ¡out ¡to ¡different ¡tasks. ¡The ¡ generators ¡are ¡all ¡derived ¡from ¡a ¡single ¡generator; ¡ hence, ¡the ¡analysis ¡of ¡the ¡sta$s$cal ¡proper$es ¡of ¡ the ¡distributed ¡generator ¡is ¡simplified. ¡

Sec$on 4: Parallel Algorithms Michelle Ku8el - PowerPoint PPT Presentation

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za The DAG, or cost graph A program execu$on using fork and join can be seen as

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

1.0 sec 0.1 sec 10 sec 1.0 sec 0.1 sec Min:500

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Venable SEC Update We are pleased to introduce Venable s SEC Update, which is designed to keep

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

Sec 3 Parents Briefing 10 Jul 2015 7.30 9.00 pm Chan Ying Yin Principal Content

Sources of Business Law Federal SEC Acts SEC Regulations SEC administrative rulings

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CHIJ St Josephs Convent Sec 2 Parent-Teacher Meeting (PTM) 12 Jan 2017 Sec Sec 2 PTM 2 PTM

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Switching for multicast short time-shift draft-yang-avt-switch-multicast-short-timeshift-00

Concept of an array rotation Arrays part 2 Imagine we want to 'rotate' the elements of an

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Circular q-shift - Hypercube Using E-cube routing q-shift in a hypercube with p nodes:

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Data Parallel Programming in R David Padua Department of

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Sec$on 4: Parallel Algorithms Michelle Ku8el - PowerPoint PPT Presentation

Sec$on 4: Parallel Algorithms Michelle Ku8el mku8el@cs.uct.ac.za The DAG, or cost graph A program execu$on using fork and join can be seen as

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

1.0 sec 0.1 sec 10 sec 1.0 sec 0.1 sec Min:500

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Venable SEC Update We are pleased to introduce Venable s SEC Update, which is designed to keep

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

Sec 3 Parents Briefing 10 Jul 2015 7.30 9.00 pm Chan Ying Yin Principal Content

Sources of Business Law Federal SEC Acts SEC Regulations SEC administrative rulings

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CHIJ St Josephs Convent Sec 2 Parent-Teacher Meeting (PTM) 12 Jan 2017 Sec Sec 2 PTM 2 PTM

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Switching for multicast short time-shift draft-yang-avt-switch-multicast-short-timeshift-00

Concept of an array rotation Arrays part 2 Imagine we want to 'rotate' the elements of an

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Circular q-shift - Hypercube Using E-cube routing q-shift in a hypercube with p nodes:

Introduction to Symbolic Dynamics Part 1: The basics Silvio Capobianco Institute of Cybernetics

Faster Parallel Algorithm for Approximate Shortest Path Jason Li (CMU) STOC 2020 March 2, 2020

Data Parallel Programming in R David Padua Department of

High-speed parallel software implementation of the T pairing Diego F. Aranha Institute of

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions