Introduction to ML & DL Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 1 / 22
Outline What’s Machine Learning? 1 What’s Deep Learning? 2 About this Course... 3 FAQ 4 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 2 / 22
Outline What’s Machine Learning? 1 What’s Deep Learning? 2 About this Course... 3 FAQ 4 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 3 / 22
Prior vs. Posteriori Knowledge To solve a problem, we need an algorithm E.g., sorting Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 22
Prior vs. Posteriori Knowledge To solve a problem, we need an algorithm E.g., sorting A priori knowledge is enough Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 22
Prior vs. Posteriori Knowledge To solve a problem, we need an algorithm E.g., sorting A priori knowledge is enough For some problem, however, we do not have the a priori knowledge E.g., to tell if an email is spam or not The correct answer varies in time and from person to person Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 22
Prior vs. Posteriori Knowledge To solve a problem, we need an algorithm E.g., sorting A priori knowledge is enough For some problem, however, we do not have the a priori knowledge E.g., to tell if an email is spam or not The correct answer varies in time and from person to person Machine learning algorithms use the a posteriori knowledge to solve problems Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 22
Prior vs. Posteriori Knowledge To solve a problem, we need an algorithm E.g., sorting A priori knowledge is enough For some problem, however, we do not have the a priori knowledge E.g., to tell if an email is spam or not The correct answer varies in time and from person to person Machine learning algorithms use the a posteriori knowledge to solve problems Learnt from examples (as extra input) Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 22
Example Data X as Extra Input Unsupervised: i = 1 , where x ( i ) ∈ R D X = { x ( i ) } N E.g., x ( i ) an email Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 5 / 22
Example Data X as Extra Input Unsupervised: i = 1 , where x ( i ) ∈ R D X = { x ( i ) } N E.g., x ( i ) an email Supervised: i = 1 , where x ( i ) ∈ R D and y ( i ) ∈ R K , X = { ( x ( i ) , y ( i ) ) } N E.g., y ( i ) ∈ { 0 , 1 } a spam label Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 5 / 22
General Types of Learning (1/2) Supervised learning : learn to predict the labels of future data points X ∈ R N × D : x ′ ∈ R D : y ′ ∈ R K : y ∈ R N × K : [ e ( 6 ) , e ( 1 ) , e ( 9 ) , e ( 4 ) , e ( 2 ) ] ? Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 6 / 22
General Types of Learning (1/2) Supervised learning : learn to predict the labels of future data points X ∈ R N × D : x ′ ∈ R D : y ′ ∈ R K : y ∈ R N × K : [ e ( 6 ) , e ( 1 ) , e ( 9 ) , e ( 4 ) , e ( 2 ) ] ? Unsupervised learning : learn patterns or latent factors in X Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 6 / 22
General Types of Learning (2/2) Reinforcement learning : learn from “good”/“bad” feedback of actions (instead of correct labels) to maximize the goal Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 7 / 22
General Types of Learning (2/2) Reinforcement learning : learn from “good”/“bad” feedback of actions (instead of correct labels) to maximize the goal AlphaGo [1] is a hybrid of reinforcement learning and supervised learning Supervised learning from the game records Then, reinforcement learning from self-play Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 7 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Model development 2 Assume a model { f ( · ; w ) } that is a collection of candidate functions 1 f ’s (representing posteriori knowledge) we want to discover f is assumed to be parametrized by w 1 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Model development 2 Assume a model { f ( · ; w ) } that is a collection of candidate functions 1 f ’s (representing posteriori knowledge) we want to discover f is assumed to be parametrized by w 1 Define a cost function C ( w ; X ) (or functional C [ f ; X ] ) that measures 2 “how good a particular f can explain the training data” Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Model development 2 Assume a model { f ( · ; w ) } that is a collection of candidate functions 1 f ’s (representing posteriori knowledge) we want to discover f is assumed to be parametrized by w 1 Define a cost function C ( w ; X ) (or functional C [ f ; X ] ) that measures 2 “how good a particular f can explain the training data” Training : employ an algorithm that finds the best (or good enough) 3 function f ∗ ( · ; w ∗ ) in the model that minimizes the cost function w ∗ = argmin w C ( w ; X ) Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Model development 2 Assume a model { f ( · ; w ) } that is a collection of candidate functions 1 f ’s (representing posteriori knowledge) we want to discover f is assumed to be parametrized by w 1 Define a cost function C ( w ; X ) (or functional C [ f ; X ] ) that measures 2 “how good a particular f can explain the training data” Training : employ an algorithm that finds the best (or good enough) 3 function f ∗ ( · ; w ∗ ) in the model that minimizes the cost function w ∗ = argmin w C ( w ; X ) Testing : evaluate the performance of the learned f ∗ using the testing 4 dataset Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
General Machine Learning Steps Data collection, preprocessing (e.g., integration, cleaning, etc.), and 1 exploration Split a dataset into the training and testing datasets 1 Model development 2 Assume a model { f ( · ; w ) } that is a collection of candidate functions 1 f ’s (representing posteriori knowledge) we want to discover f is assumed to be parametrized by w 1 Define a cost function C ( w ; X ) (or functional C [ f ; X ] ) that measures 2 “how good a particular f can explain the training data” Training : employ an algorithm that finds the best (or good enough) 3 function f ∗ ( · ; w ∗ ) in the model that minimizes the cost function w ∗ = argmin w C ( w ; X ) Testing : evaluate the performance of the learned f ∗ using the testing 4 dataset Apply the model in the real world 5 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 22
Example for Spam Detection Random split of your past emails and labels 1 Training dataset: X = { ( x ( i ) , y ( i ) ) } i 1 Testing dataset: X ′ = { ( x ′ ( i ) , y ′ ( i ) ) } i 2 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 22
Example for Spam Detection Random split of your past emails and labels 1 Training dataset: X = { ( x ( i ) , y ( i ) ) } i 1 Testing dataset: X ′ = { ( x ′ ( i ) , y ′ ( i ) ) } i 2 Model development 2 Model : { f : f ( x ; w ) = w ⊤ x } 1 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 22
Example for Spam Detection Random split of your past emails and labels 1 Training dataset: X = { ( x ( i ) , y ( i ) ) } i 1 Testing dataset: X ′ = { ( x ′ ( i ) , y ′ ( i ) ) } i 2 Model development 2 Model : { f : f ( x ; w ) = w ⊤ x } 1 Cost function : C ( w ; X ) = Σ i 1 ( w ; f ( x ( i ) ; w ) � = y ( i ) ) 2 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 22
Example for Spam Detection Random split of your past emails and labels 1 Training dataset: X = { ( x ( i ) , y ( i ) ) } i 1 Testing dataset: X ′ = { ( x ′ ( i ) , y ′ ( i ) ) } i 2 Model development 2 Model : { f : f ( x ; w ) = w ⊤ x } 1 Cost function : C ( w ; X ) = Σ i 1 ( w ; f ( x ( i ) ; w ) � = y ( i ) ) 2 Training : to solve w ∗ = argmin w Σ i 1 ( w ; f ( x ( i ) ; w ) � = y ( i ) ) 3 Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 22
Recommend
More recommend