Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70

deep learning can seem mysterious 2 / 70

let's �nd a way to just build a function 3 / 70

Feed Forward Layer # X.shape == (512,) # output.shape == (4,) # weights.shape == (512, 4) == 2048 # biases.shape == (4,) def feed_forward (activation, X, weights, biases): return activation(X @ weights + biases) IE: f ( X ) = σ ( X × W + b ) 4 / 70

What's so special? 5 / 70

Composable # Just like a Logistic Regression result = feed_forward( softmax, X, outer_weights, outer_biases ) 6 / 70

Composable # Just like a Logistic Regression with learned features? result = feed_forward( softmax, feed_forward( tanh, X, inner_weights, inner_biases ) outer_weights, outer_biases ) 7 / 70

nonlinear 8 / 70

UNIVERSAL APPROXIMATION THEOREM neural networks can approximate arbitrary functions 9 / 70

di�erentiable ➡ SGD Iteratively learn the values for the weights and biases given training data 10 / 70

11 / 70

12 / 70

13 / 70

Convolutional Layer import numpy as np from scipy.signal import convolve # X.shape == (800, 600, 3) # filters.shape == (8, 8, 3, 16) # biases.shape == (3, 16) # output.shape < (792, 592, 16) def convnet (activation, X, filters, biases): return activation( np.stack([convolve(X, f) for f in filter]) + biases ) IE: f ( X ) = σ ( X ∗ f + b ) 14 / 70

15 / 70

Recurrent Layer # X_sequence.shape == (None, 512) # output.shape == (None, 4) # W.shape == (512, 4) # U.shape == (4, 4) # biases.shape == (4,) def RNN (activation, X_sequence, W, U, biases, activation): output = None for X in X_sequence: output = activation(x @ W + output @ U + biases) yield output IE: ( f t X t ) = σ ( X t × W + f t − 1 X t − 1 ( ) × U + b ) 16 / 70

GRU Layer def GRU (activation_in, activation_out, X_sequence, W, U, biases): output = None for X in X_sequence: z = activation_in(W[0] @ X + U[0] @ output + biases[0]) r = activation_in(W[1] @ X + U[1] @ output + biases[1]) o = activation_out(W[2] @ x + U[2] @ (r @ output) + biases[2]) output = z * output + (1 - z) * o yield output 17 / 70

What about theano/tensor�ow/mxnet? 18 / 70

what happens here? import numpy as np a = np.random.random(100) - 0.5 a[a < 0] = 10 19 / 70

statement sequence while return condition variable compare body op: ≠ name: a variable constant branch name: b 20 / 70 value: 0

library widely used auto-diff gpu/cpu mobile frontend models multi-gpu speed numpy ✔ ✖ ✖ ✖ ✖ ✖ ✖ slow theano ✔ ✔ ✔ ✖ ✖ ✖ ✖ fast library widely used auto-diff gpu/cpu mobile frontend models multi-gpu speed mx-net ✖ ✔ ✔ ✔ ✔ ✔ ✔ fast tensor�ow ✔ ✔ ✔ ✖ ✔ ✔ ➖ slow 21 / 70

Which should I use? 22 / 70

keras makes Deep Learning simple (http://keras.io/) 23 / 70

$ cat ~/.keras/keras.json { "image_dim_ordering": "th", "epsilon": 1e-07, "floatx": "float32", "backend": "theano" } or $ cat ~/.keras/keras.json { "image_dim_ordering": "tf", "epsilon": 1e-07, "floatx": "float32", "backend": "tensorflow" } 24 / 70

(coming soon... hopefully) $ cat ~/.keras/keras.json { "image_dim_ordering": "mx", "epsilon": 1e-07, "floatx": "float32", "backend": "mxnet" } 25 / 70

simple! from keras.models import Sequential from keras.layers.core import Dense # Same as our Logistic Regression above with: # weights_outer.shape = (512, 4) # biases_outer.shape = (4,) model_lr = Sequential() model_lr.add(Dense(4, activation='softmax', input_shape=[512])) model_lr.compile('sgd', 'categorical_crossentropy') model_lr.fit(X, y) 26 / 70

extendible! from keras.models import Sequential from keras.layers.core import Dense # Same as our "deep" Logistic Regression model = Sequential() model.add(Dense(128, activation='tanh', input_shape=[512])) model.add(Dense(4, activation='softmax')) model.compile('sgd', 'categorical_crossentropy') model.fit(X, y) 27 / 70

model_lr.summary() # __________________________________________________________________________ # Layer (type) Output Shape Param # Connected to # ========================================================================== # dense_1 (Dense) (None, 4) 2052 dense_input_1[0][0] # ========================================================================== # Total params: 2,052 # Trainable params: 2,052 # Non-trainable params: 0 # __________________________________________________________________________ model.summary() # ___________________________________________________________________ # Layer (type) Output Shape Param # Connected to # =================================================================== # dense_2 (Dense) (None, 128) 65664 dense_input_2[0][0] # ___________________________________________________________________ # dense_3 (Dense) (None, 4) 516 dense_2[0][0] # =================================================================== # Total params: 66,180 # Trainable params: 66,180 # Non-trainable params: 0 # ___________________________________________________________________ 28 / 70

let's build something 29 / 70

30 / 70

31 / 70

32 / 70

fastforwardlabs.com/luhn/ 33 / 70

34 / 70

35 / 70

36 / 70

37 / 70

38 / 70

39 / 70

40 / 70

41 / 70

42 / 70

43 / 70

def skipgram (words): for i in range(1, len(words)-1): yield words[i], (words[i-1], words[i+1]) 44 / 70

45 / 70

46 / 70

from keras.models import Model from keras.layers import (Input, Embedding, Merge, Lambda, Activation) vector_size=300 word_index = Input(shape=1) word_point = Input(shape=1) syn0 = Embedding(len(vocab), vector_size)(word_index) syn1= Embedding(len(vocab), vector_size)(word_point) merge = Merge([syn0, syn1], mode='mul') merge_sum = Lambda( lambda x: x.sum(axis=-1))(merge) context = Activation('sigmoid')(merge_sum) model = Model(input=[word, context], output=output) model.compile(loss='binary_crossentropy', optimizer='adam') 47 / 70

Feed Forward vs Recurrent Network 48 / 70

49 / 70

RNN Summarization Sketch for Articles 1. Find articles summaries heavy on quotes (http://thebrowser.com/) 2. Score every sentence in the articles based on their "closeness" to a quote 3. Use skip-thoughts to encode every sentence in the article 4. Train and RNN to predict these scores given the sentence vector 5. Evaluate trained model on new things! 50 / 70

keras makes RNN's simple (http://keras.io/) 51 / 70

Example: proprocess from skipthoughts import skipthoughts from .utils import load_data (articles, scores), (articles_test, scores_test) = load_data() articles_vectors = skipthoughts.encode(articles) articles_vectors_test = skipthoughts.encode(articles_test) 52 / 70

Example: model def and training from keras.models import Model from keras.layers.recurrent import LSTM from keras.layers.core import Dense from keras.layers.wrappers import TimeDistributed model = Model() model.add(LSTM(512, input_shape=( None , 4800), dropout_W=0.3, dropout_U=0.3)) model.add(TimeDistributed(Dense(1))) model.compile(loss='mean_absolute_error', optimizer='rmsprop') model.fit(articles_vectors, scores, validation_split=0.10) loss, acc = model.evaluate(articles_vectors_test, scores_test) print('Test loss / test accuracy = {:.4f} / {:.4f}' .format(loss, acc)) model.save("models/new_model.h5") 53 / 70

article preprocess size of data sent1 sent2 sent3 sent4 sent5 sent6 list(text) skip skip skip skip skip skip thought thought thought thought thought thought (6,4800) LSTM LSTM LSTM LSTM LSTM LSTM keras (6,512) Dense Dense Dense Dense Dense Dense (6,1) 54 / 70

Example Model: evaluation from keras.models import load_model from flask import Flask, request import nltk app = Flask(__name__) model = load_model("models/new_model.h5") @app.route('/api/evaluate', methods=['POST']) def evaluate (): article = request.data sentences = nltk.sent_tokenize(article) sentence_vectors = skipthoughts.encode(sentences) return model.predict(sentence_vectors) 55 / 70

56 / 70

57 / 70

Thoughts of doing this method Scoring function used is SUPER important Hope you have a GPU Hyper-parameters for all! Structure of model can change where it's applicable SGD means random initialization... may need to �t multiple times 58 / 70

59 / 70

REGULARIZE! dropout : only parts of the NN participate in every round l1/l2 : add penalty term for large weights batchnormalization : unit mean/std for each batch of data 60 / 70

VALIDATE AND TEST! lots of parameters == potential for over�tting 61 / 70

deploy? 62 / 70

63 / 70

Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 - PowerPoint PPT Presentation

Practical Deep Learning micha.codes / fastforwardlabs.com 1 / 70 deep learning can seem mysterious 2 / 70 let's nd a way to just build a function 3 / 70 Feed Forward Layer # X.shape == (512,) # output.shape == (4,) # weights.shape ==

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

Practical Experience with Practical Experience with Practical Experience with Practical

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

How-to Login to Your DPS Chromebook Plus how-to access Clever, NCEdCloud,and Canvas Step 1:

2017 PARCC Data Overview Kildeer School District 96 Board Meeting October 3, 2017 2017 PARCC

FISCAL CHALLENGES $ 40M Budget Gap No Fund Balance 60% Social Services Caseloads 70% Mental

The State of the City The State of the City The Honorable Scott Myers Mayor Mr. J. Michael

Role of Equipment Manager in HIPAA HIPAA Role of Equipment Manager in & & HIPAA and

AssetWorks M5 v15 M5 v15 Information The screens/frames contain the same information in

Webinar. The topics that will be discussed today include: the current WCMSA proposal submission

Market Valida,on Mark Robinson Start-Up Execu6ve & Advisor: