CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Deep Learning. Petr Poˇ s´ ık petr.posik@fel.cvut.cz Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Poˇ s´ ık c � 2020 Artificial Intelligence – 1 / 44 petr.posik@fel.cvut.cz
Deep Learning P. Poˇ s´ ık c � 2020 Artificial Intelligence – 2 / 44 petr.posik@fel.cvut.cz
Question Based on your current knowledge and intuition, which of the following options is the best characterization of deep learning (DL) and its relation to machine learning (ML)? Deep Learning A • Question DL is any ML process that requires a deep involvement of a human designer in ex- • Definition tracting the right features from the raw data. • History • Terminology B DL is any solution to a ML problem that uses neural networks with a few, but very • Ex: Word embed. large hidden layers. • Ex: w2v arch. • Ex: w2v results C DL is a set of ML methods allowing us not only to solve the problem at hand, but also • Why deep? gain deep understanding of the solution process. • A new idea? • Boom of Deep Nets D DL is any method that tries to automatically transform the raw data into a represen- • Autoencoders • Stacked autoenc. tation suitable for the solution of our problem, often at multiple level of abstraction. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 44 petr.posik@fel.cvut.cz
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • Question suitable features. • Definition • History • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 44 petr.posik@fel.cvut.cz
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • Question suitable features. • Definition • History • Terminology Representation learning: • Ex: Word embed. • Ex: w2v arch. ■ Set of methods allowing a machine to be fed with raw data and to automatically • Ex: w2v results discover the representations suitable for correct classification/regression/modeling. • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 44 petr.posik@fel.cvut.cz
What is Deep learning? Conventional ML techniques: ■ Limited in their ability to process natural data in their raw form. Deep Learning ■ Successful applications required careful engineering and human expertise to extract • Question suitable features. • Definition • History • Terminology Representation learning: • Ex: Word embed. • Ex: w2v arch. ■ Set of methods allowing a machine to be fed with raw data and to automatically • Ex: w2v results discover the representations suitable for correct classification/regression/modeling. • Why deep? • A new idea? • Boom of Deep Nets Deep learning: • Autoencoders • Stacked autoenc. ■ Representation-learning methods with multiple levels of representation, with • Pre-training increasing level of abstraction . ConvNets ■ Compose simple, but often non-linear modules transforming the representation at Successes one level into a representation at a higher, more abstract level. Recurrent Nets ■ The layers learn to represent the inputs in a way that makes it easy to predict the Other remarks target outputs. Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 44 petr.posik@fel.cvut.cz
A brief history of Neural Networks ■ 1940s: Model of neuron (McCulloch, Pitts) ■ 1950-60s: Modeling brain using neural networks (Rosenblatt, Hebb, etc.) Deep Learning ■ 1969: Research stagnated after Minsky and Papert’s book Perceptrons • Question ■ 1970s: Backpropagation • Definition • History ■ 1986: Backpropagation popularized by Rumelhardt, Hinton, Williams • Terminology ■ 1990s: Convolutional neural networks (LeCun) • Ex: Word embed. • Ex: w2v arch. ■ 1990s: Recurrent neural networks (Schmidhuber) • Ex: w2v results • Why deep? ■ 2006: Revival of deep networks, unsupervised pre-training (Hinton et al.) • A new idea? ■ 2013-: Huge industrial interest • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 44 petr.posik@fel.cvut.cz
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning • Question • Definition • History • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 44 petr.posik@fel.cvut.cz
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • Question • Definition ■ A classifier uses the original representation: • History • Terminology Input Output • Ex: Word embed. layer layer • Ex: w2v arch. • Ex: w2v results x 1 • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders x 2 • Stacked autoenc. • Pre-training y 1 � ConvNets x 3 Successes Recurrent Nets Other remarks x 4 Summary ■ A classifier uses features which are derived from the original representation: ■ A classifier uses features which are derived from the features derived from the original representation: P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 44 petr.posik@fel.cvut.cz
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • Question • Definition ■ A classifier uses the original representation: • History • Terminology ■ A classifier uses features which are derived from the original representation: • Ex: Word embed. Input Hidden Output • Ex: w2v arch. • Ex: w2v results layer layer layer • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders x 1 • Stacked autoenc. • Pre-training ConvNets x 2 Successes � y 1 Recurrent Nets Other remarks x 3 Summary x 4 ■ A classifier uses features which are derived from the features derived from the original representation: P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 44 petr.posik@fel.cvut.cz
Terminology ■ Narrow vs wide : Refers to the number of units in a layer . ■ Shallow vs deep : Refers to the number of layers . Deep Learning Making a deep architecture: • Question • Definition ■ A classifier uses the original representation: • History • Terminology ■ A classifier uses features which are derived from the original representation: • Ex: Word embed. ■ A classifier uses features which are derived from the features derived from the • Ex: w2v arch. • Ex: w2v results original representation: • Why deep? Input Hidden Hidden Output • A new idea? • Boom of Deep Nets layer layer layer 1 layer 2 • Autoencoders • Stacked autoenc. • Pre-training x 1 ConvNets Successes Recurrent Nets x 2 Other remarks Summary y 1 � x 3 x 4 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 44 petr.posik@fel.cvut.cz
Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Deep Learning • Question • Definition • History • Terminology • Ex: Word embed. • Ex: w2v arch. • Ex: w2v results • Why deep? • A new idea? • Boom of Deep Nets • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 44 petr.posik@fel.cvut.cz
Example: Word embeddings Sometimes, even shallow architectures can do surprisingly well! Representation of text (words, sentences): Deep Learning • Question ■ Important for many real-world apps: search, ads recommendation, ranking, spam • Definition filtering, . . . • History • Terminology ■ Local representations (a concept is represented by a single node): • Ex: Word embed. ■ N-grams, 1-of-N coding, Bag of words • Ex: w2v arch. • Ex: w2v results ■ Easy to construct. • Why deep? • A new idea? ■ Large and sparse. • Boom of Deep Nets ■ No notion of similarity (synonyms, words with similar meaning ). • Autoencoders • Stacked autoenc. • Pre-training ConvNets Successes Recurrent Nets Other remarks Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 44 petr.posik@fel.cvut.cz
Recommend
More recommend