@MagnusHyttsten
Meet Robin
Guinea Pig Meet Robin
An Awkward Social Experiment (that I'm afraid you need to be part of...)
Super ROCKS!
"QCon" Input Data Examples (Train & Test Data) Model <Awkward Output (Your Silence> Brain)
"QCon" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Super Rocks" Model Output Loss (Your function Brain) Optimizer
"QCon" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Super Rocks" "Super Rocks" Model Output Loss (Your function Brain) Optimizer
Agenda Intro to Machine Learning Frontiers of Machine Learning Creating a TensorFlow Model Why are TPUs Great for Machine Learning Workloads Distributed TensorFlow Training
Agenda Intro to Machine Learning Frontiers of Machine Learning Creating a TensorFlow Model Why are TPUs Great for Machine Learning Workloads Distributed TensorFlow Training
Agenda Intro to Machine Learning Frontiers of Machine Learning Creating a TensorFlow Model Why are TPUs Great for Machine Learning Workloads Distributed TensorFlow Training
Ophthalmology Radiology “The network performed similarly to senior orthopedic 0.95 0.91 surgeons when presented with images at the same resolution as the network.” Algorithm Ophthalmologist (median) www.tandfonline.com/doi/full/10.1080/17453674.2017.1344459
Pathology https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
ImageNet Alaskan Malamute Siberian Husky
http://news.stanford.edu/2017/01/25/artificial-intelligence-used-identify-skin-cancer
Input Saturation Defocus
Data, Data, Data Compute, Compute, Compute
Data, Data, Data Compute, Compute, Compute Humans, Humans, Humans
How long did it take for a Human to construct this? Improving Inception and Image Classification in TensorFlow research.googleblog.com/2016/08/improving-inception-and-image.html
AM!!!
Current: Solution = ML expertise + data + computation
Current: Solution = ML expertise + data + computation Can we turn this into: Solution = data + 100X computation
Current: Solution = ML expertise + data + computation Can we turn this into: Solution = data + 100X computation ??? Can We Learn How To Teach Machines To Learn
CIFAR-10
ImageNet Learning Transferable Architectures for Scalable Image Recognition , Barret Zoph, Vijay Vasudevan, Jonathon Shlens and Quoc Le, https://arxiv.org/abs/1707.07012
Agenda Intro to Machine Learning Frontiers of Machine Learning Creating a TensorFlow Model Why are TPUs Great for Machine Learning Workloads Distributed TensorFlow Training
Premade Estimators Datasets Estimator tf.keras tf.keras.layers Python Frontend Java C++ TensorFlow Distributed Execution Engine CPU GPU Android iOS ...
TensorFlow Estimator Architecture Estimator (tf.estimator) calls input_fn (Datasets, tf.data)
Premade Estimators Estimator (tf.estimator) calls input_fn (Datasets, tf.data) subclass Premade Estimators LinearClassifier DNNLinearCombinedClassifier DNNLinearCombinedRegressor DNNClassifier BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor
Premade Estimators Premade Estimators Datasets LinearRegressor(...) LinearClassifier(...) DNNRegressor(...) DNNClassifier(...) estimator = DNNLinearCombinedRegressor(...) DNNLinearCombinedClassifier(...) BaselineRegressor(...) BaselineClassifier(...) # Train locally estimator.train ( input_fn=..., ... estimator.evaluate( input_fn=..., ...) Datasets estimator.predict ( input_fn=..., ...)
Custom Models #1 - model_fn Estimator (tf.estimator) calls calls Keras Layers (tf.keras.layer) model_fn input_fn use (Datasets, tf.data) subclass Premade Estimators LinearClassifier DNNLinearCombinedClassifier DNNLinearCombinedRegressor DNNClassifier BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor
Custom Models tf.Estimator tf.keras.layers # Imports yada yada ... def model_fn (input, ...): tf.keras.layers Conv2D(32, kernel_size=(3, 3), activation='relu') MaxPooling2D( l1, pool_size=(2, 2) Flatten( l2 ) Dense( l3, 128, activation='relu') Dropout(0.2)) Dense(10, activation='softmax') model.compile(loss='categorical_crossentropy' , optimizer='adam' , metrics=['accuracy'] )
Train/Evaluate Model Estimator Datasets # Convert a Keras model to tf.estimator.Estimator ... estimator = keras.estimator.model_to_estimator ( model, ... ) # Train locally estimator.train ( input_fn=..., ... estimator.evaluate( input_fn=..., ...) Datasets estimator.predict ( input_fn=..., ...)
Summary - Use Estimators, Datasets, and Keras Premade Estimators (tf.estimator): When possible ● ● Custom Models a. model_fn in Estimator & tf.keras.layers ● Datasets (tf.data) for the input pipeline
Agenda Intro to Machine Learning Frontiers of Machine Learning Creating a TensorFlow Model Why are TPUs Great for Machine Learning Workloads Distributed TensorFlow Training
We may have a huge number of layers ● Each layer can have huge number of neurons ● --> There may be 100s millions or even billions * and + ops All knobs are W values that we need to tune So that given a certain input, they generate the correct output
"Matrix Multiplication is EATING (the computing resources of) THE WORLD" h i_j = [X 0 , X 1 , X 2, ... ] * [W 0 , W 1 , W 2, ... ] h i_j = X 0 *W 0 + X 1 *W 1 + X 2 *W 2 + ...
Matmul X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6
Single-threaded Execution
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 . . * . . . . 256 0.1 [ [
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . * . . . . 256 0.1 [ [
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [
Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W Single-threaded [ [ Execution 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 256 * t . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [
Tensor Processing Unit (TPU) v2
X 33 Matrix Unit Systolic Array X 32 X 23 inputs X 31 X 22 X 13 X 21 X 12 Computing y = Wx X 11 3x3 systolic array W = 3x3 matrix Matrix Unit (MXU) W 11 W 12 W 13 Batch-size(x) = 3 weights W 21 W 22 W 23 W 31 W 32 W 33 accumulation
Matrix Unit Systolic Array X 33 Computing y = Wx inputs with W = 3x3, batch-size(x) = 3 X 32 X 23 X 22 X 13 X 31 X 21 X 12 W 11 Matrix Unit (MXU) W 12 W 13 X 11 weights W 21 W 22 W 23 W 31 W 32 W 33 accumulation
Matrix Unit Systolic Array Computing y = Wx inputs with W = 3x3, batch-size(x) = 3 X 33 X 32 X 23 X 22 X 13 X 31 W 12 X 12 W 11 Matrix Unit (MXU) W 13 + X 21 W 11 X 11 weights W 21 W 22 W 23 X 11 W 31 W 32 W 33 accumulation
Matrix Unit Systolic Array Computing y = Wx inputs with W = 3x3, batch-size(x) = 3 X 33 X 32 X 23 W 12 X 22 W 13 X 13 W 11 Matrix Unit (MXU) + + X 31 W 11 X 21 ... weights W 22 X 12 W 21 W 23 + X 21 W 21 X 11 W 31 W 32 W 33 X 11 accumulation
Matrix Unit Systolic Array Computing y = Wx inputs with W = 3x3, batch-size(x) = 3 outputs X 33 W 12 X 32 W 13 X 23 Matrix Unit (MXU) W 11 Y 11 = W 11 X 11 + W 12 X 12 + W 13 X 13 + + W 11 X 31 ... weights W 22 X 22 W 23 X 13 W 21 + + X 31 W 21 X 21 ... W 32 X 12 W 31 W 33 + X 21 W 31 X 11 accumulation
Recommend
More recommend