magnushyttsten meet robin guinea pig
play

@MagnusHyttsten Meet Robin Guinea Pig Meet Robin An Awkward - PowerPoint PPT Presentation

@MagnusHyttsten Meet Robin Guinea Pig Meet Robin An Awkward Social Experiment (that I'm afraid you need to be part of...) ROCKS! "GTC" Input Data Examples (Train & Test Data) Model <Awkward Output (Your Silence>


  1. @MagnusHyttsten

  2. Meet Robin

  3. Guinea Pig Meet Robin

  4. An Awkward Social Experiment (that I'm afraid you need to be part of...)

  5. ROCKS!

  6. "GTC" Input Data Examples (Train & Test Data) Model <Awkward Output (Your Silence> Brain)

  7. "GTC" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Rocks" Model Output Loss (Your function Brain) Optimizer

  8. "GTC" Labels Input Data (Correct Answers) Examples (Train & Test Data) "Rocks" "Rocks" Model Output Loss (Your function Brain) Optimizer

  9. Agenda Intro to Machine Learning Creating a TensorFlow Model Why are GPUs Great for Machine Learning Workloads Distributed TensorFlow Training

  10. Agenda Intro to Machine Learning Creating a TensorFlow Model Why are GPUs Great for Machine Learning Workloads Distributed TensorFlow Training

  11. Agenda Intro to Machine Learning Creating a TensorFlow Model Why are GPUs Great for Machine Learning Workloads Distributed TensorFlow Training

  12. Premade Estimators Datasets Estimator tf.keras tf.keras.layers Python Frontend Java C++ TensorFlow Distributed Execution Engine CPU GPU Android iOS ...

  13. TensorFlow Estimator Architecture Estimator (tf.estimator) calls input_fn (Datasets, tf.data)

  14. Premade Estimators Estimator (tf.estimator) calls input_fn (Datasets, tf.data) subclass Premade Estimators LinearClassifier DNNLinearCombinedClassifier DNNLinearCombinedRegressor DNNClassifier BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor

  15. Premade Estimators Premade Estimators Datasets LinearRegressor(...) LinearClassifier(...) DNNRegressor(...) DNNClassifier(...) estimator = DNNLinearCombinedRegressor(...) DNNLinearCombinedClassifier(...) BaselineRegressor(...) BaselineClassifier(...) # Train locally estimator.train ( input_fn=..., ... estimator.evaluate( input_fn=..., ...) Datasets estimator.predict ( input_fn=..., ...)

  16. Custom Models #1 - model_fn Estimator (tf.estimator) calls calls Keras Layers (tf.keras.layer) model_fn input_fn use (Datasets, tf.data) subclass Premade Estimators LinearClassifier DNNLinearCombinedClassifier DNNLinearCombinedRegressor DNNClassifier BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor

  17. Custom Models #2 - Keras Model Estimator Keras model_to_estimator (tf.estimator) (tf.keras) calls calls Keras Layers (tf.keras.layer) model_fn input_fn use (Datasets, tf.data) subclass Premade Estimators LinearClassifier DNNLinearCombinedClassifier DNNLinearCombinedRegressor DNNClassifier BaselineClassifier LinearRegressor BaselineRegressor DNNRegressor

  18. Custom Models tf.keras.layers tf.keras # Imports yada yada ... model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(10, activation='softmax')) model.compile(loss='categorical_crossentropy' , optimizer='adam' , metrics=['accuracy'] )

  19. Train/Evaluate Model Estimator Datasets # Convert a Keras model to tf.estimator.Estimator ... estimator = keras.estimator.model_to_estimator ( model, ... ) # Train locally estimator.train ( input_fn=..., ... estimator.evaluate( input_fn=..., ...) Datasets estimator.predict ( input_fn=..., ...)

  20. Summary - Use Estimators, Datasets, and Keras ● Premade Estimators (tf.estimator): When possible ● Custom Models a. model_fn in Estimator & tf.keras.layers b. Keras Models (tf.keras) estimator = keras.model_to_estimator(...) ■ ● Datasets (tf.data) for the input pipeline

  21. Agenda Intro to Machine Learning Creating a TensorFlow Model Why are GPUs Great for Machine Learning Workloads Distributed TensorFlow Training

  22. Disclaimer... High-Level - We look at only parts of the power of GPUs ● ● Simple Overview - More optimal designs exist Reduced Scope - Only considering fully-connected layers, etc ●

  23. Strengths of V100 GPU Built for Massively Parallel Computations ● ● Hardware & software suitable to manage Deep Learning Workloads (Tensor Cores, mixed-precision execution, etc)

  24. Strengths of V100 GPU Built for Massively Parallel Computations ● ● Specific hardware / software to manage Deep Learning Workloads (Tensor Cores, mixed-precision execution, etc) Tesla SXM V100 ● 5376 cores (FP32)

  25. Strengths of V100 GPU What are we going to do with 5376 FP32 cores?

  26. Strengths of V100 GPU What are we going to do with 5376 FP32 cores? "Execute things in parallel"!

  27. Strengths of V100 GPU What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads?

  28. Strengths of V100 GPU What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads? "Hey, that's your job - That's why we're here listening"!

  29. Strengths of V100 GPU What are we going to do with 5376 FP32 cores? "Execute things in parallel"! Yes, but how can we exactly do that for ML Workloads? "Hey, that's your job - That's why we're here listening"! Alright, let's talk about that then

  30. We may have a huge number of layers ● Each layer can have huge number of neurons ● --> There may be 100s millions or even billions * and + ops All knobs are W values that we need to tune So that given a certain input, they generate the correct output

  31. "Matrix Multiplication is EATING (the computing resources of) THE WORLD" h i_j = [X 0 , X 1 , X 2, ... ] * [W 0 , W 1 , W 2, ... ] h i_j = X 0 *W 0 + X 1 *W 1 + X 2 *W 2 + ...

  32. Matmul X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6

  33. Single-threaded Execution

  34. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

  35. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

  36. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 . . * . . . . 256 0.1 [ [

  37. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . * . . . . 256 0.1 [ [

  38. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W [ [ 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [

  39. Single-threaded Execution X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X Prev W Single-threaded [ [ Execution 1*0.1 = 0.1 1 0.1 2 0.1 0.1 + 2*0.1 = 0.3 256 * t . . . * . . . 3238.5+255*0.1 = 3264 . . 256 0.1 3264 + 256*0.1 = 3289.6 [ [

  40. GPU Execution

  41. GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

  42. GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 Tesla SXM V100 2 0.1 5376 cores (FP32) . . * . . . . 256 0.1 [ [

  43. GPU - #1 Multiplication Step X = [1.0, 2.0, ..., 256.0] # Let's say we have 256 input values W = [0.1, 0.1, ..., 0.1] # Then we need to have 256 weight values h 0,0 = X * W # [1*0.1 + 2*0.1 + ... + 256*0.1] == 32389.6 X W [ [ 1 0.1 2 0.1 . . * . . . . 256 0.1 [ [

Recommend


More recommend