! welcome CS 744: SPLIT ANNOTATIONS Shivaram Venkataraman Fall 2020
ADMINISTRIVIA Course Project Checkins – due tomorrow! Hot CRP → In-class project presentations Dec 8 th and Dec 10 th presentation 4 slot min Sign up sheet on Piazza a 5 min → @ LA + a min upload slides
↳ computing computing semesters cloud maintain and compose efficiency j NEW HARDWARE and data MODELS
⇒ SETTING options workload ✓ Intel ) pricing MKL Multi-core machines // inputs are double arrays with `len` elems vdLog1p(len, d1, d1);// d1 = log(d1) Multiple functions and libraries - vdAdd(len, d1, tmp, d1);// d1 = d1 + tmp . // d1 = d1 / vol_sqrt scope - vdDiv(len, d1, vol_sqrt, d1); ↳ optimizes movement is Data a) across all within a operators expensive even d1 machine - Cpu → TVM larger is Arrays I data ↳ layers of ④ de streaming . PNN cache them to DRAM & writes reads - spark - ↳ cake if data fits in memory
COMPILER-BASED APPROACHES want → - we be → to - - here ! Replace every library call to emit Kvm ) intermediate representation (IR) loop fusion rich Compile all the IR together Existing ← pipelining Nunley , libraries Lots of code change required! - g- Pandas - - -
GOALS Provide data movement optimizations across libraries intrusive be very Require minimal or no changes to existing libraries not → Leverage existing hand-tuned code for speedups - I I matrix FFT multiply
APPROACH split execution Build (1) earhex . nu graph ti . splits sized cache 14 pass d1 = price * strike → - - function d1 = np.log2(d1) + strike to every - - -
SPLIT ANNOTATIONS library Given easier to provide a types data ↳ fewer code changing than . @splittable( than - size: SizeSplit(size), a: ArraySplit(size), operators a IT ex - • # mut out: ArraySplit(size)) = - void vdLog1p(long size, double*a, double*out) - ← 'T pipeline these size " ] a :[ can you Vd Scale ( long functions double * a) int scalar , size , y ' ::::÷i÷ : Split types: N ⟨ V0...Vn ⟩ e.g,: ArraySplit ⟨ 10, 2 ⟩ for 10 element array, 2 pieces - . . . - Split annotation: Name and split type to each argument and return value out " ) expert fashion same the as is split in output
⇒ IMPLEMENTING SPLIT API same shares . If Arrays flitter > data can you split type → pipeline safely Parameters ) ⇒ split ( double intend start * a pipeline cannot , . If you return results at prior merge function @splittable(m:MatrixSplit(m, axis), axis:_) next call -> ReduceSplit(axis) ,[ vector sumReduceToVector(matrix m, int axis); → log , multiply > Eg dog , multiply , #D!m ⑦ Reduces Hit imide implemented operation eye → partial outputs - combine to class
MOZART DESIGN execution this → Capture graph I II - this evaluate → lazily opportunity maximum graph , IT to pipeline
PYTHON CLIENT LIBRARY p Already exists Writing Annotations: Function decorators ] @sa((DataFrameSplit(), DataFrameSplit()), {}, DataFrameSplit()) - def divide(series, value): Pandas library calls somebody Capturing the graph 1 If Wraps original Python function and registers in graph be " divide can , by decorator - Returns a Future object → ( Ray , Pywren ) intercepted - constructed Graph is Evaluation Points internally Lazily evaluate by overriding __getattribute__ oral internally Future [ Data frame ] do : print ( Io ) the → result call print the and on .
MOZART RUNTIME 'm :3 :D It :S ? Take dataflow graph à execution plan 'm Series of stages each stage split, pipeline and merge w . . . . . . e. are . - merge split - pipeline Choosing a batch size Set number of elements per batch using L2 cache size will fit cache number of elements that in L2 . compute
SUMMARY workload Iterative Applications compose data processing libraries add ↳ will Data movement is bottleneck on multi-core machines to graph stages Key idea: Split and pipeline data across functions ↳ pipeline across iterations ? Split Annotations to reduce programmer effort Mozart: Client library and runtime for lazy evaluation
DISCUSSION https://forms.gle/F2LJ21qFkBGWyypB7
↳ How does the dataflow graph that is executed by Mozart compare to dataflow graphs we have seen in other systems like Spark/PyT orch etc. Similarities Differences execution hazy tolerance is → Fault → objective not the dependencies narrow → checkpoint 'ng = pipelined → No by Mozart . black bones Functions are → . shuffling can't pick merging us → optimal join 3.7¥ ' operator , →
Mhienednhfthreads increase men bandwidth two ?e' " " . expensive ✓ comp , mid for add - - 7 - speedier exp e - n Ix ' → - I - more having threads can intensive compute ⇒ not functions leed E mem speed up bottleneck how much
NEXT STEPS Next class: TPU Project check-ins on HotCRP!
Recommend
More recommend