A Lightweight Primitive for Online Tuning
Cuttlefish
by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft)
1
Cuttlefish A Lightweight Primitive for Online Tuning by Tomer - - PowerPoint PPT Presentation
Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft) 1 Logical Operators have multiple physical Operators The system should automatically
A Lightweight Primitive for Online Tuning
by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft)
1
Logical Operators have multiple physical Operators… The system should automatically choose!
Logical Operators have multiple physical Operators… The system should automatically choose!
(Some) Prior Work on Query Optimization
3
(Some) Prior Work on Query Optimization
models
3
(Some) Prior Work on Query Optimization
models
selectivity estimates)
reordering)
3
4
4
development effort
4
development effort
4
[1] http://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2- 2.html
development effort
4
[1] http://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2- 2.html
development effort
sophisticated operators, not just relational operators!
4
[1] http://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-spark-2- 2.html
5
“A Cuttlefish pretending to be a rock”
5
*Image Sourced from https://www.flickr.com/photos/silkebaron/32001215104
“A Cuttlefish pretending to be a rock”
5
Generate Training Data from:
etc.
*Image Sourced from https://www.flickr.com/photos/silkebaron/32001215104
6
CNN
HTML Data
Train a caption- generating model
Output Model Conv RNN
Repeat
Regex Join
Images
Filter
Generate Training Labels
...
Conv
*caption-generating model portion of the logical plan inspired by: Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
6
CNN
HTML Data
Train a caption- generating model
Output Model Conv RNN
Repeat
Regex Join
Images
Filter
Generate Training Labels
...
Conv
*caption-generating model portion of the logical plan inspired by: Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015
Diverse, sophisticated operators, with multiple physical alternatives!
Example Operator: Convolution
7
Example Operator: Convolution
7
Tested 3 convolution algorithms on 8000 Flickr images
Can we optimize without a full-fledged optimizer?
8
Prior Work: Tuning Black-box Operators
9
Prior Work: Tuning Black-box Operators
9
Prior Work: Tuning Black-box Operators
multi-core settings
9
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Cuttlefish: A Lightweight Primitive for Online Tuning
10
CNN
HTML Data
Train a caption- generating model
Output Model Conv RNN
Repeat
Regex Join
Images
Filter
Generate Training Labels
...
Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Cuttlefish: A Lightweight Primitive for Online Tuning
10
CNN
HTML Data
Train a caption- generating model
Output Model Conv RNN
Repeat
Regex Join
Images
Filter
Generate Training Labels
...
Conv
Join
Output Model Tuner FilterRegex
Images RNNRepeat CNN
Nest. Loop Mat. Mult FFT ... Tuner Hash Tuner TunerConv Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
11
Cuttlefish: A Lightweight Primitive for Online Tuning
Join
Output Model Tuner FilterRegex
Images RNNRepeat CNN
Nest. Loop Mat. Mult FFT ... Tuner Hash Tuner TunerConv Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Developer maps tuning rounds to the execution model of each operator:
11
Cuttlefish: A Lightweight Primitive for Online Tuning
Join
Output Model Tuner FilterRegex
Images RNNRepeat CNN
Nest. Loop Mat. Mult FFT ... Tuner Hash Tuner TunerConv Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Developer maps tuning rounds to the execution model of each operator:
11
Cuttlefish: A Lightweight Primitive for Online Tuning
Join
Output Model Tuner FilterRegex
Images RNNRepeat CNN
Nest. Loop Mat. Mult FFT ... Tuner Hash Tuner TunerConv Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Developer maps tuning rounds to the execution model of each operator:
11
Cuttlefish: A Lightweight Primitive for Online Tuning
Join
Output Model Tuner FilterRegex
Images RNNRepeat CNN
Nest. Loop Mat. Mult FFT ... Tuner Hash Tuner TunerConv Conv
Workload developer (or the query optimizer) inserts calls to Cuttlefish’s API to pick physical operators during execution
Developer maps tuning rounds to the execution model of each operator:
11
Cuttlefish: A Lightweight Primitive for Online Tuning
12
I. Problem & Motivation
VII.Other Operators VIII.Conclusion
13
13
13
13
Cuttlefish tuners maximize the total reward after multiple choose-observe tuning rounds
13
Tuning Convolution with Cuttlefish
14
convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
14
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
15
I. Problem & Motivation
III.Bandit-based Online Tuning
VII.Other Operators VIII.Conclusion
16
Multi-armed Bandit Problem
16
Multi-armed Bandit Problem
16
Multi-armed Bandit Problem
16
Multi-armed Bandit Problem
16
Multi-armed Bandit Problem
Goal: Maximize Cumulative Reward
(by balancing exploration & exploitation)
16
17
17
Reward Arm 1 Arm 2 Arm 3 Arm 4 Belief distributions about expected reward
18
Reward Arm 1 Arm 2 Arm 3 Arm 4
19
Reward Arm 1 Arm 2 Arm 3 Arm 4
19
Reward Arm 1 Arm 2 Arm 3 Arm 4
20
Reward Arm 1 Arm 2 Arm 3 Arm 4
21
Reward Arm 1 Arm 2 Arm 3 Arm 4 Better arms chosen more often
22
22
22
22
22
23
23
Matrix Multiply)
23
Matrix Multiply)
23
Matrix Multiply)
23
Matrix Multiply)
23
24
Relative throughput normalized against the highest-throughput algorithm
24
Relative throughput normalized against the highest-throughput algorithm
24
Relative throughput normalized against the highest-throughput algorithm
25
I. Problem & Motivation
VII.Other Operators VIII.Conclusion
Challenges in Distributed Tuning
26
Challenges in Distributed Tuning
26
Challenges in Distributed Tuning
26
Challenges in Distributed Tuning
earlier reward is observed?
26
27
27
Machine 1 Machine 2 Machine 3
Choose/Observe
Centralized Tuner
27
Machine 1 Machine 2 Machine 3
Choose/Observe
Centralized Tuner
Machine 1 Machine 2 Machine 3
Push Local / Pull Global
Global Model Store
Independent Tuners, Centralized Store
27
Machine 1 Machine 2 Machine 3
Choose/Observe
Centralized Tuner
Machine 1 Machine 2 Machine 3
Push Local / Pull Global
Global Model Store
Independent Tuners, Centralized Store
27
Machine 1 Machine 2 Machine 3
Choose/Observe
Centralized Tuner
Machine 1 Machine 2 Machine 3
Push Local / Pull Global
Global Model Store
Independent Tuners, Centralized Store Peer-to-Peer is also a possibility, but requires more communication
28
…
Local State Thread 1
Worker 1 Model Store
Non-local State Local State
Non-local State Thread 2 Thread 3 Worker 2: Local State Local State Thread 1
Worker 2
Non-local State Thread 2 Thread 3 Worker 1: Local State *On Master or a Parameter Server*
28
…
Local State Thread 1
Worker 1 Model Store
Non-local State Local State
Non-local State Thread 2 Thread 3 Worker 2: Local State Local State Thread 1
Worker 2
Non-local State Thread 2 Thread 3 Worker 1: Local State *On Master or a Parameter Server*
28
…
Local State Thread 1
Worker 1 Model Store
Non-local State Local State
Non-local State Thread 2 Thread 3 Worker 2: Local State Local State Thread 1
Worker 2
Non-local State Thread 2 Thread 3 Worker 1: Local State *On Master or a Parameter Server*
28
…
Local State Thread 1
Worker 1 Model Store
Non-local State Local State
Non-local State Thread 2 Thread 3 Worker 2: Local State Local State Thread 1
Worker 2
Non-local State Thread 2 Thread 3 Worker 1: Local State *On Master or a Parameter Server*
28
…
Local State Thread 1
Worker 1 Model Store
Non-local State Local State
Non-local State Thread 2 Thread 3 Worker 2: Local State Local State Thread 1
Worker 2
Non-local State Thread 2 Thread 3 Worker 1: Local State *On Master or a Parameter Server*
Results with Distributed Approach
29
Relative throughput normalized against the highest-throughput algorithm
Results with Distributed Approach
30
Throughput normalized against an ideal oracle that always picks the fastest algorithm
Results with Distributed Approach
30
Throughput normalized against an ideal oracle that always picks the fastest algorithm
31
I. Problem & Motivation
VII.Other Operators VIII.Conclusion
32
image & filter dimensions
32
image & filter dimensions
32
image & filter dimensions
32
33
model that maps features to rewards
33
model that maps features to rewards
33
model that maps features to rewards
33
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
34
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose() tuner.observe(token, reward)
Tuning Convolution with Cuttlefish
def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … def getDimensions(image, filters): … for image, filters in convolutions: context = getDimensions(image, filters) start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime)
35
tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) convolve, token = tuner.choose(context) tuner.observe(token, reward) context
Contextual Convolution Results
36
Throughput normalized against an ideal oracle that always picks the fastest algorithm
37
I. Problem & Motivation
VI.Handling Nonstationary Settings VII.Other Operators VIII.Conclusion
38
data properties varying throughout the workload, etc.
by website. This could correlate with performance
38
data properties varying throughout the workload, etc.
by website. This could correlate with performance
38
data properties varying throughout the workload, etc.
by website. This could correlate with performance
38
39
change detection, etc.
39
change detection, etc.
bandit problems
39
change detection, etc.
bandit problems
problems may change over time!
39
40
Observations Agents (core or machine)
41
Observations Agents (core or machine)
41
Observations Agents (core or machine)
Use all epochs that pass a statistical similarity test
41
Observations Agents (core or machine)
Use all epochs that pass a statistical similarity test
42
Observations Agents (core or machine)
Store only one ‘aggregated old state’ per epoch
42
Observations Agents (core or machine)
Store only one ‘aggregated old state’ per epoch At epoch end: If similar to old, merge into ‘old state’ . Otherwise, replace ‘old state’
42
Observations Agents (core or machine)
Store only one ‘aggregated old state’ per epoch At epoch end: If similar to old, merge into ‘old state’ . Otherwise, replace ‘old state’ Identify (& merge) similar non-local states only at communication rounds, in the centralized model store
43
Throughput normalized against an ideal oracle that always picks the fastest algorithm
44
I. Problem & Motivation
VII.Other Operators VIII.Conclusion
45
45
45
45
45
the fastest document, but over 1000s for the slowest document
45
the fastest document, but over 1000s for the slowest document
46
Note: Y-axis is Log-scale
Distributed Parallel Join Operator
47
Distributed Parallel Join Operator
47
Distributed Parallel Join Operator
47
Distributed Parallel Join Operator
47
Distributed Parallel Join Operator
47
explicit configurations (defaults to global sort-merge join)
Distributed Parallel Join Operator
47
explicit configurations (defaults to global sort-merge join)
Distributed Parallel Join Operator
47
explicit configurations (defaults to global sort-merge join)
Join Results (Query Throughput)
48
Join Results (Query Throughput)
48
But, requires exploration & provides no ‘special ordering’ benefits
Cuttlefish join usually faster (Join throughput graphs even more dramatic)
Join Results (Query Throughput)
48
But, requires exploration & provides no ‘special ordering’ benefits
49
I. Problem & Motivation
VII.Other Operators VIII.Conclusion
50
convolution, regex, and join operators