Introduction to Machine Learning Part 1 and Part 2 Yingyu Liang - - PowerPoint PPT Presentation

introduction to machine learning part 1 and part 2
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Part 1 and Part 2 Yingyu Liang - - PowerPoint PPT Presentation

Introduction to Machine Learning Part 1 and Part 2 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [Partially Based on slides from Jerry Zhu and Mark Craven] What is machine learning? Short


slide-1
SLIDE 1

Introduction to Machine Learning Part 1 and Part 2

Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison

[Partially Based on slides from Jerry Zhu and Mark Craven]

slide-2
SLIDE 2

What is machine learning?

  • Short answer: recent buzz word
slide-3
SLIDE 3

Industry

  • Google
slide-4
SLIDE 4

Industry

  • Facebook
slide-5
SLIDE 5

Industry

  • Microsoft
slide-6
SLIDE 6

Industry

  • Toyota
slide-7
SLIDE 7

Academy

  • NIPS 2015: ~4000 attendees, double the

number of NIPS 2014

slide-8
SLIDE 8

Academy

  • Science special issue
  • Nature invited review
slide-9
SLIDE 9

Image

  • Image classification

– 1000 classes

Slides from Kaimin He, MSRA

Human performance: ~5%

slide-10
SLIDE 10

Image

  • Object location

Slides from Kaimin He, MSRA

slide-11
SLIDE 11

Image

  • Image captioning

Figure from the paper “DenseCap: Fully Convolutional Localization Networks for Dense Captioning”, by Justin Johnson, Andrej Karpathy, Li Fei-Fei

slide-12
SLIDE 12

Text

  • Question & Answer

Figures from the paper “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing ”, by Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Richard Socher

slide-13
SLIDE 13

Game

Google DeepMind's Deep Q-learning playing Atari Breakout From the paper “Playing Atari with Deep Reinforcement Learning”, by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

slide-14
SLIDE 14

Game

slide-15
SLIDE 15

The impact

  • Revival of Artificial Intelligence
  • Next technology revolution?
  • A big thing ongoing, should not miss
slide-16
SLIDE 16

MACHINE LEARNING BASICS

slide-17
SLIDE 17

What is machine learning?

  • “A computer program is said to learn from

experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.”

  • ------ Machine Learning, Tom Mitchell, 1997

learning

slide-18
SLIDE 18

Example 1: image classification

Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

slide-19
SLIDE 19

Example 1: image classification

indoor

  • utdoor

Experience/Data: images with labels Indoor

slide-20
SLIDE 20

Example 1: image classification

  • A few terminologies

– Instance – Training data: the images given for learning – Test data: the images to be classified

slide-21
SLIDE 21

Example 1: image classification (multi-class)

ImageNet figure borrowed from vision.standford.edu

slide-22
SLIDE 22

Example 2: clustering images

Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

slide-23
SLIDE 23

Example 2: clustering images

  • A few terminologies

– Unlabeled data vs labeled data – Supervised learning vs unsupervised learning

slide-24
SLIDE 24

Feature vectors

Indoor

Extract features Feature space

Feature vectors 𝑦𝑗 Label 𝑧𝑗

slide-25
SLIDE 25

Feature vectors

  • utdoor

1

Extract features Feature space

Feature vectors 𝑦𝑘 Label 𝑧𝑘

slide-26
SLIDE 26

Feature Example 2: little green men

  • The weight and height of 100 little green men

Feature space

slide-27
SLIDE 27

Feature Example 3: Fruits

  • From Iain Murray http://homepages.inf.ed.ac.uk/imurray2/
slide-28
SLIDE 28

Feature example 4: text

  • Text document

– Vocabulary of size D (~100,000)

  • “bag of word”: counts of each vocabulary entry

– To marry my true love ➔ (3531:1 13788:1 19676:1) – I wish that I find my soulmate this year ➔ (3819:1 13448:1 19450:1 20514:1)

  • Often remove stopwords: the, of, at, in, …
  • Special “out-of-vocabulary” (OOV) entry catches

all unknown words

slide-29
SLIDE 29

UNSUPERVISED LEARNING BASICS

slide-30
SLIDE 30

Unsupervised learning

Common tasks:

  • clustering, separate the n instances into groups
  • novelty detection, find instances that are very different from the rest
  • dimensionality reduction, represent each instance with a lower

dimensional feature vector while maintaining key characteristics of the training samples

slide-31
SLIDE 31

Anomaly detection

learning task performance task

slide-32
SLIDE 32

Anomaly detection example

Let’s say our model is represented by: 1979-2000 average, ±2 stddev Does the data for 2012 look anomalous?

slide-33
SLIDE 33

Dimensionality reduction

slide-34
SLIDE 34

Dimensionality reduction example

We can represent a face using all of the pixels in a given image More effective method (for many tasks): represent each face as a linear combination of eigenfaces

slide-35
SLIDE 35

Clustering

slide-36
SLIDE 36

Example 1: Irises

slide-37
SLIDE 37

Example 2: your digital photo collection

  • You probably have >1000 digital photos, ‘neatly’ stored in

various folders…

  • After this class you’ll be about to organize them better

– Simplest idea: cluster them using image creation time (EXIF tag) – More complicated: extract image features

slide-38
SLIDE 38

Two most frequently used methods

  • Many clustering algorithms. We’ll look at the

two most frequently used ones:

– Hierarchical clustering

Where we build a binary tree over the dataset

– K-means clustering

Where we specify the desired number of clusters, and use an iterative algorithm to find them

slide-39
SLIDE 39

HIERARCHICAL CLUSTERING

slide-40
SLIDE 40

Hierarchical clustering

slide-41
SLIDE 41

Building a hierarchy

slide-42
SLIDE 42
slide-43
SLIDE 43

Hierarchical clustering

  • Initially every point is in its own cluster
slide-44
SLIDE 44

Hierarchical clustering

  • Find the pair of clusters that are the closest
slide-45
SLIDE 45

Hierarchical clustering

  • Merge the two into a single cluster
slide-46
SLIDE 46

Hierarchical clustering

  • Repeat…
slide-47
SLIDE 47

Hierarchical clustering

  • Repeat…
slide-48
SLIDE 48

Hierarchical clustering

  • Repeat…until the whole dataset is one giant cluster
  • You get a binary tree (not shown here)
slide-49
SLIDE 49

Hierarchical Agglomerative Clustering

slide-50
SLIDE 50

Hierarchical clustering

  • How do you measure the closeness between

two clusters?

slide-51
SLIDE 51
  • How do you measure the closeness between

two clusters? At least three ways:

– Single-linkage: the shortest distance from any member of one cluster to any member of the

  • ther cluster. Formula?

– Complete-linkage: the greatest distance from any member of one cluster to any member of the

  • ther cluster

– Average-linkage: you guess it!

Hierarchical clustering

slide-52
SLIDE 52

Hierarchical clustering

slide-53
SLIDE 53

K-MEANS CLUSTERING

slide-54
SLIDE 54

K-means clustering

slide-55
SLIDE 55

K-means clustering

slide-56
SLIDE 56

K-means clustering

slide-57
SLIDE 57

K-means clustering

  • Randomly picking 5

positions as initial cluster centers (not necessarily a data point)

slide-58
SLIDE 58

K-means clustering

  • Each point finds which

cluster center it is closest

  • to. The point is assigned

to that cluster.

slide-59
SLIDE 59

K-means clustering

  • Each cluster computes its

new centroid, based on which points belong to it

slide-60
SLIDE 60

K-means clustering

  • Each cluster computes its

new centroid, based on which points belong to it

  • And repeat until

convergence (cluster centers no longer move)…

slide-61
SLIDE 61

K-means algorithm

slide-62
SLIDE 62

Questions on k-means

  • What is k-means trying to optimize?
  • Will k-means stop (converge)?
  • Will it find a global or local optimum?
  • How to pick starting cluster centers?
  • How many clusters should we use?
slide-63
SLIDE 63

Distortion

slide-64
SLIDE 64

The optimization objective

slide-65
SLIDE 65

Step 1

slide-66
SLIDE 66

Step 2

slide-67
SLIDE 67

Step 2

slide-68
SLIDE 68

Repeat (step1, step2)

slide-69
SLIDE 69

Repeat (step1, step2)

There are finite number of points Finite ways of assigning points to clusters In step1, an assignment that reduces distortion has to be a new assignment not used before Step1 will terminate So will step 2 So k-means terminates

slide-70
SLIDE 70

Will find global optimum?

  • Sadly no guarantee
slide-71
SLIDE 71

Will find global optimum?

slide-72
SLIDE 72

Will find global optimum?

slide-73
SLIDE 73

Picking starting cluster centers

slide-74
SLIDE 74

Picking the number of clusters

  • Difficult problem
  • Domain knowledge?
  • Otherwise, shall we find k which minimizes

distortion?

slide-75
SLIDE 75

Picking the number of clusters

#dimensions #clusters #points