Statistical Natural Language Processing Recap: logistic regression Learning in ANNs Non-linearity and MLP MLP Non-linearity Introduction 5 / 34 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, . . . where Learning in ANNs said to be linearly separable Non-linearity and MLP MLP Non-linearity Introduction 4 / 34 Artifjcial Neural networks: an introduction SfS / University of Tübingen Ç. Çöltekin, . . . activation function . otherwise Linear separability if one can fjnd a linear where 0 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, engineering 0 1 1 1 0 1 1 1 0 0 discriminator 0 Can a linear classifjer learn the XOR problem? Learning in ANNs Non-linearity and MLP MLP Non-linearity Introduction 6 / 34 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, There is no line that can separate positive and negative classes. problem example is the logical XOR if Summer Semester 2019 7 / 34 The biological neuron Ç. Çöltekin, simple processing units 1950’s – with some ups and downs ‘ deep learning ’ methods Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 1 / 34 Introduction Non-linearity MLP Non-linearity and MLP Learning in ANNs (showing a picture of a real neuron is mandatory in every ANN lecture) SfS / University of Tübingen Artifjcial and biological neural networks Learning in ANNs Non-linearity and MLP Axon terminal Axon Soma Dendrite *Image source: Wikipedia MLP Non-linearity Introduction 2 / 34 Summer Semester 2019 Ç. Çöltekin, do not mimic biological networks Summer Semester 2019 Non-linearity and MLP Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2019 Introduction Non-linearity MLP Recap: the perceptron Non-linearity and MLP Learning in ANNs 3 / 34 Artifjcial neural networks Learning in ANNs MLP fjnding the global minimum of the error function Introduction SfS / University of Tübingen models inspired by biological neural networks Non-linearity • Artifjcial neural networks (ANNs) are machine learning • ANNs are powerful non-linear models • Power comes with a price: there are no guarantees of • ANNs have been used in ML, AI, Cognitive science since • Currently they are the driving force behind the popular • ANNs are inspired by biological neural networks • Similar to biological networks, ANNs are made of many • Despite the similarities, there are many difgerences: ANNs • ANNs are practical statistical machine learning methods x 0 = 1 x 0 = 1 m ∑ y = f w j x j m j ∑ x 1 x 1 w 0 w 0 P ( y ) = f w j x j w 1 w 1 j { x 2 y x 2 P ( y ) + 1 wx > 0 w 2 w 2 f ( x ) = − 1 1 f ( x ) = w m w m 1 + e − wx In ANN-speak f ( · ) is called an x m x m • We can use non-linear basis functions x 2 w 0 + w 1 x 1 + w 2 x 2 + w 3 φ ( x 1 , x 2 ) • A classifjcation problem is 1 + − is still linear in w for any choice of φ ( · ) • For example, adding the product x 1 x 2 as an additional feature would allow a solution like: x 1 + x 2 − 2x 1 x 2 • A well-known counter x 1 − + x 1 x 2 x 1 + x 2 − 2x 1 x 2 0 1 • Choosing proper basis functions like x 1 x 2 is called feature
Introduction 12 / 34 Output Each unit takes a weighted sum of their input, and applies a (non-linear) activation function . Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 Introduction Input Non-linearity MLP Non-linearity and MLP Learning in ANNs Artifjcial neurons . . Hidden Non-linearity weighted sum of the inputs SfS / University of Tübingen multi-layer perceptron (MLP) consisting of perceptron-like units activation function problems – It can be used for both regression and classifjcation Ç. Çöltekin, Summer Semester 2019 the picture 11 / 34 Introduction Non-linearity MLP Non-linearity and MLP Learning in ANNs Multi-layer perceptron . transformation Learning in ANNs Activation functions in ANNs 14 / 34 Introduction Non-linearity MLP Non-linearity and MLP Learning in ANNs hidden units SfS / University of Tübingen (difgerentiable) functions Sigmoid (logistic) Hyperbolic tangent (tanh) Rectifjed linear unit (relu) Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 Summer Semester 2019 Ç. Çöltekin, non-linear activation Non-linearity and MLP Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2019 13 / 34 Introduction Non-linearity MLP Learning in ANNs becomes Artifjcial neurons an example . . . function is logistic sigmoid function Multi-layer perceptron y Non-linearity and MLP Non-linearity Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, linearly separable space, the points are problem into 3D function maps the MLP solution in the 3D input space Non-linear basis functions Learning in ANNs Non-linearity and MLP MLP Introduction Introduction 8 / 34 Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, the problem discriminant that solves is a (non-linear) The solution to solution in the original input space Non-linear basis functions Learning in ANNs Non-linearity and MLP MLP 9 / 34 15 / 34 Non-linearity Introduction SfS / University of Tübingen Ç. Çöltekin, ‘ not bad ’ is not ‘ not ’ + ‘ bad ’ (e.g., for sentiment analysis) MLP – Some efgects are periodic (e.g., many measures of time) of the variable (e.g., reaction time change by age) – The efgect may be strong or positive only in a certain range This is not always the case: 10 / 34 Summer Semester 2019 non-linearities are abundant in nature, it is not only the XOR problem Non-linearity and MLP Non-linearity Where do non-linearities come from? Learning in ANNs + − 1 1 • The additional basis x 1 x 2 x 2 x 1 + x 2 − 2x 1 x 2 − 0 . 5 = 0 0 . 5 • In the new, mapped − + 0 0 0 1 1 0 0 1 x 1 x 2 x 1 • The simplest modern ANN architecture is called In a linear model, y = w 0 + w 1 x 1 + . . . + w k x k • The outcome is linearly-related to the predictors • The MLP is a fully connected , feed-forward network • The efgects of the inputs are additive • Unlike perceptron, the units in an MLP use a continuous • Some predictors afgect the outcome in a non-linear way • The MLP can be trained using gradient-based methods • The MLP can represent many interesting machine learning • Some predictors interact • The unit calculates a x 0 = 1 m x 1 ∑ w j x j = wx w 0 x 1 j x 2 w 1 • Result is a linear x 2 ∑ y f ( · ) w 2 x 3 • Then the unit applies a x 4 w m function f ( · ) x m • Output of the unit is y = f ( wx ) x 0 = 1 • The activation functions in MLP are typically continuous • A common activation w 0 x 1 • For hidden units common choices are w 1 1 1 1 + e x f ( x ) = ∑ x 2 y 1 + e − x w 2 e 2x − 1 • The output of the network e 2x + 1 w m 1 max ( 0 , x ) y = x m 1 + e − wx
Recommend
More recommend