15-150 Fall 2020 Stephen Brookes Lecture 17 Sequences and cost graphs Halloween, a full moon and a time change all happening simultaneously
announcements • Next Tuesday (3 Nov) is ELECTION DAY • Class will be held as usual (on zoom) • Homework is NOT DUE on Tuesday to allow you time to participate in voting • NEW: the TAs will hold a Weekly Review this evening (will be recorded, too)
today parallel programming • cost semantics • Brent’s Theorem • sequences : an abstract type with efficient parallel operations
parallelism exploiting multiple processors evaluating independent code simultaneously • low-level implementation • scheduling work onto processors • high-level planning • designing code abstractly • without baking in a schedule
our approach design abstractly • behavioral correctness • asymptotic runtime (work, span) reason abstractly • independently of schedule • cost semantics and evaluation
• You design the code • The compiler schedules the work
functional benefits • No side effects, so… evaluation order doesn’t affect correctness • Can build abstract types that support efficient parallel-friendly operations • Can use work and span to predict potential for parallel speed-up • Work and span are independent of scheduling details
caveat • In practice, it’s hard to achieve speed-up • Current language implementations don’t make it easy • Problems include: • scheduling overhead • locality of data (cache problems) • runtime sensitive to scheduling choices
why bother? • It’s good to think abstractly first and figure out details later • Focus on data dependencies when you design your code • Our thesis: this approach to parallelism will prevail... (and 15-210 builds on these ideas...)
cost semantics We already introduced work and span • Work estimates the sequential evaluation time on a single processor • Span takes account of data dependency, estimates the parallel evaluation time with unlimited processors
cost semantics • We showed how to calculate work and span for recursive functions with recurrence relations • Now we introduce cost graphs , another way to deal with work and span • Cost graphs also allow us to talk about schedules... ... and the potential for speed-up
cost graphs A cost graph is a series-parallel graph • a directed graph, with source and sink • nodes represent units of work (constant time) • edges represent data dependencies • branching indicates potential parallelism
series-parallel graphs . . . a single node . . . G 1 . . . G 1 G 2 . . G 2 sequential parallel composition composition
work and span of a cost graph • The work is the number of nodes • The span is the length of the longest path from source to sink span (G) ≤ work (G)
. . G 1 . work = work G 1 + work G 2 + c . G 2 dependent code … add the work . . . . . work = work G 1 + work G 2 + c G 1 G 2 . independent code … add the work
. . G 1 . span = span G 1 + span G 2 + c . G 2 . dependent code … add the span . . . . span = max(span G 1 , span G 2 ) + c G 1 G 2 . independent code … max the span
sources and sinks • Sometimes we omit them from pictures • no loss of generality • easy to put them back in • No difference, asymptotically • a single node represents an additive constant amount of work and span • Allows easier explanation of execution
⑦ ⑩ ② ① ④ ⑧ ① ⑪ ⑦ ⑨ ⑥ ⑤ ③ ② example and must be done before each node represents a single unit of work work = 11 (number of nodes) span = 4 (longest path length)
using cost graphs • Every expression can be given a cost graph • Can calculate work and span using the graph • These are asymptotically the same as the work and span derived from recurrence relations work and span provide work: single processor span: unlimited processors asymptotic estimates of actual running time, basic ops under certain assumptions take constant time
① ② ⑦ ⑥ ⑧ ⑨ ⑩ ⑧ ⑪ ⑩ ⑨ ③ ④ ⑥ ⑤ ④ ③ ② ① ⑪ ⑦ ⑤ scheduling assign units of work to processors respecting data dependency • Work: number of nodes • Span: length of critical path uses 5 processors { (i) (ii) an optimal w = 11 (iii) parallel schedule s = 4 (iv) (5 rounds, or 4 steps) (v)
⑨ ⑩ ⑩ ⑥ ⑧ ⑦ ⑤ ③ ④ ① ② ⑧ ⑪ ⑪ ⑦ ⑨ ⑥ ⑤ ④ ③ ② ① example What if there are only 2 processors? (i) a best schedule (ii) for 2 processors (iii) w = 11 (iv) (6 rounds, s = 4 (v) 5 steps) (vi) 2 processors cannot do the job as fast as 5 (!)
Brent’s Theorem An expression with work w and span s can be evaluated on a p -processor machine in time O(max( w / p , s )). Find me the smallest p such that Optimal schedule using p processors: Do (up to) p units of work each round w / p ≤ s Total work to do is w Needs at least s steps Using more than this many processors won’t yield any speed-up Richard Brent is an illustrious Australian mathematician and computer scientist. David Brent is the manager of the Slough branch He is known for Brent’s Theorem , which shows that a parallel algorithm can of Wernham–Hogg. He wants to know how many always be adapted to run on fewer processors with only the obvious time penalty computers to buy to improve office efficiency. —a beautiful example of an “obvious” but non-trivial theorem.
② ⑪ ⑥ ① ⑦ ⑧ ⑪ ⑤ ⑧ ④ ⑩ ⑦ ⑨ ⑥ ⑤ ③ ② example min { p | w/ p ≤ s} is 3 (i) a best schedule ① ③ ④ (ii) for 3 processors w = 11 (iii) s = 4 (iv) (5 rounds, ⑨ ⑩ (v) 4 steps) 3 processors can do the work as fast as 5(!)
summary • Cost graphs give us another way to talk about work and span • Brent’s Theorem tells us about the potential for parallel speed-up • check if w/p ≤ s
next • Exploiting parallelism in ML • A signature for parallel collections • Cost analysis of implementations • Cost benefits of parallel algorithm design - we revisit some list-based functions - sequence-based functions are faster
sequences signature SEQ = sig type ’a seq exception Range val tabulate : (int -> ’a) -> int -> ’a seq val length : ’a seq -> int val nth : int -> ’a seq -> ’a val split : ’a seq -> ’a seq * ’a seq val map : (’a -> ’b) -> ’a seq -> ’b seq val reduce : (’a * ’a -> ’a) -> ’a -> ’a seq -> ’a val mapreduce : (’a -> ’b) -> ’b -> (’b * ’b -> ’b) -> ’a seq -> ’b end
SEQ • We may expand the SEQ signature later… … with some extra functions • For today, let’s keep it simple • Purpose: a value of type t seq is a sequence of values of type t • with faster operations than those available for t list
implementations • Many ways to implement the signature • lists, balanced trees, arrays, ... • For each one, can give a cost analysis • There may be implementation trade-offs • lists: access is O(n), length is O(n) • arrays: access is O(1), length is O(1) • trees: access is O(log n), length is ?? Obviously, a list-based implementation of sequences isn’t going to be faster than lists! But arrays, trees, can be.
Seq : SEQ • An abstract parameterized type of sequences • Think of a sequence as a parallel collection • With parallel-friendly operations • constant-time access to items • efficient map and reduce We’ll work today with an implementation Seq : SEQ based on vectors
notation • We have an abstract type of sequences • We want to think about sequence values in a way that’s independent of any specific implementation • could be lists, arrays, trees, … • We need a neutral notation for sequences ⟨ v 0 , ..., v n-1 ⟩ This is NOT program syntax!
notation • Remember that if we have structures like ListSeq : SEQ ArraySeq : SEQ BalancedTreeSeq : SEQ we can use qualified names like ListSeq.empty, int ListSeq.seq
think abstractly • We’ll mostly use the abstract notation for sequences • We’ll give abstract specifications • But we’ll discuss work/span characteristics for a specific implementation • other implementations may have different work/span
sequence values A value of type t seq is a sequence of values of type t • We use math notation like ⟨ v 1 , ..., v n ⟩ ⟨ v 0 , ..., v n-1 ⟩ ⟨ ⟩ for sequence values ⟨ 1, 2, 4, 8 ⟩ is a value of type int seq
equality • Two sequence values are (extensionally) equal iff they have the same length and have equal items at all positions ⟨ v 1 , ..., v n ⟩ = ⟨ u 1 , ..., u m ⟩ if and only if n = m and for all i, v i = u i Again, this is NOT program notation
operations For our given structure Seq : SEQ, we specify • the (extensional) behavior • the cost semantics of each operation Other implementations of SEQ are designed to have the same extensional behavior but may have different work/span profiles Learn to choose wisely!
Recommend
More recommend