Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39
• HW1 turned in • HW2 released • Office hour • Group formation signup Machine Learning: Chenhao Tan | Boulder | 2 of 39
Overview Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 3 of 39
Feature engineering Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 4 of 39
Feature engineering Feature Engineering Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 5 of 39
Feature engineering Brainstorming What are features useful for sentiment analysis? Machine Learning: Chenhao Tan | Boulder | 6 of 39
Feature engineering What are features useful for sentiment analysis? • Unigram • Bigram • Normalizing options • Part-of-speech tagging • Parse-tree related features • Negation related features • Additional resources Machine Learning: Chenhao Tan | Boulder | 7 of 39
Feature engineering Sarcasm detection “Trees died for this book?” (book) Machine Learning: Chenhao Tan | Boulder | 8 of 39
Feature engineering Sarcasm detection “Trees died for this book?” (book) • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Machine Learning: Chenhao Tan | Boulder | 8 of 39
Feature engineering More examples: Which one will be retweeted more? [Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html Machine Learning: Chenhao Tan | Boulder | 9 of 39
Revisiting Logistic Regression Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 10 of 39
Revisiting Logistic Regression Revisiting Logistic Regression 1 P ( Y = 0 | x , β ) = 1 + exp [ β 0 + � i β i X i ] exp [ β 0 + � i β i X i ] P ( Y = 1 | x , β ) = 1 + exp [ β 0 + � i β i X i ] log P ( y ( j ) | X ( j ) , β ) � L = − j Machine Learning: Chenhao Tan | Boulder | 11 of 39
Revisiting Logistic Regression Revisiting Logistic Regression • Transformation on x (we map class labels from { 0 , 1 } to { 1 , 2 } ): l i = β T i x , i = 1 , 2 exp l i o i = , i = 1 , 2 � c ∈{ 1 , 2 } exp l c • Objective function (using cross entropy − � i p i log q i ): P ( y ( j ) = 1 ) log P (ˆ y i = 1 | x ( j ) , β ) + P ( y ( j ) = 0 ) log ˆ L ( Y , ˆ � Y ) = − P ( y i = 0 | X i ) j Machine Learning: Chenhao Tan | Boulder | 12 of 39
Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Linear Softmax layer x 1 x 2 o 1 l 1 . . . o 2 l 2 x d Machine Learning: Chenhao Tan | Boulder | 13 of 39
Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Single layer Layer x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 14 of 39
Feed Forward Networks Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 15 of 39
Feed Forward Networks Deep Neural networks A two-layer example (one hidden layer) Input Output Hidden x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 16 of 39
Feed Forward Networks Deep Neural networks More layers: Input Output Hidden 1 Hidden 2 Hidden 3 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 17 of 39
Feed Forward Networks Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in b l , weight matrix in W l W 1 , b 1 W 2 , b 2 W 3 , b 3 W 4 , b 4 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 18 of 39
Feed Forward Networks Forward propagation algorithm Suppose your network has L layers Make a prediction based on text point x 1: Initialize a 0 = x 2: for l = 1 to L do z l = W l a l − 1 + b l 3: a l = g ( z l ) 4: 5: end for y is simply a L 6: The prediction ˆ Machine Learning: Chenhao Tan | Boulder | 19 of 39
Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Machine Learning: Chenhao Tan | Boulder | 20 of 39
Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations. Machine Learning: Chenhao Tan | Boulder | 20 of 39
Feed Forward Networks Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) • Loss function (objective function) L ( y , ˆ y ) • Learning (next lecture) Machine Learning: Chenhao Tan | Boulder | 21 of 39
Feed Forward Networks Nonlinearity Options • Sigmoid 1 f ( x ) = 1 + exp( x ) • tanh f ( x ) = exp( x ) − exp( − x ) exp( x ) + exp( − x ) • ReLU (rectified linear unit) f ( x ) = max( 0 , x ) • softmax exp( x ) x = � x i exp( x i ) https://keras.io/activations/ Machine Learning: Chenhao Tan | Boulder | 22 of 39
Feed Forward Networks Nonlinearity Options Machine Learning: Chenhao Tan | Boulder | 23 of 39
Feed Forward Networks Loss Function Options • ℓ 2 loss � y i ) 2 ( y i − ˆ i • ℓ 1 loss � | y i − ˆ y i | i • Cross entropy � − y i log ˆ y i i • Hinge loss (more on this during SVM) max( 0 , 1 − y ˆ y ) https://keras.io/losses/ Machine Learning: Chenhao Tan | Boulder | 24 of 39
Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 25 of 39
Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 We consider a simple activation function � z ≥ 0 1 f ( z ) = 0 z < 0 Machine Learning: Chenhao Tan | Boulder | 25 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 Machine Learning: Chenhao Tan | Boulder | 26 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 w = ( 1 , 1 ) , b = − 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 26 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 Machine Learning: Chenhao Tan | Boulder | 27 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 w = ( 1 , 1 ) , b = − 1 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 27 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 28 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 w = ( − 1 , − 1 ) , b = 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 28 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Machine Learning: Chenhao Tan | Boulder | 29 of 39
Feed Forward Networks A Perceptron Example Increase the number of layers. x 1 0 1 0 1 x 2 0 0 1 1 0 1 1 0 x 1 XOR x 2 b b � 1 � � − 0 . 5 � 1 W 1 = , b 1 = − 1 − 1 1 . 5 x 1 o 1 h 1 � 1 � W 2 = , b 2 = − 1 . 5 x 2 h 2 1 Machine Learning: Chenhao Tan | Boulder | 30 of 39
Recommend
More recommend