Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Convolutional Autoencoders Convolution + deconvolution layers: Decoder is a simplified DeconvNet [28] trained from scratch: Uppooling → upsampling (no need to remember max positions) Deconvolution → convolution Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 81

Codes & Reconstructed x A 32 -bit code can roughly represents a 32 × 32 (1024 dimensional) MNIST image Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 21 / 81

Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 81

Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds A manifold is a topological space that are linear locally Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 81

Manifolds II For each point x on a manifold, we have its tangent space spanned by tangent vectors Local directions specify how one can change x infinitesimally while staying on the manifold Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 81

Learning Manifolds I How to make c produced by autoencoders denote a coordinate of a dimensional manifold? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 24 / 81

Learning Manifolds I How to make c produced by autoencoders denote a coordinate of a dimensional manifold? Contractive autoencoder [20]: regularizes the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 24 / 81

Learning Manifolds I How to make c produced by autoencoders denote a coordinate of a dimensional manifold? Contractive autoencoder [20]: regularizes the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Hence, c represents only the variations needed to reconstruct x I.e., c changes most along tangent vectors Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 24 / 81

Learning Manifolds II In practice, it is easier to train a denoising autoencoder [26]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 81

Learning Manifolds II In practice, it is easier to train a denoising autoencoder [26]: Encoder: to encode x with random noises Decoder: to reconstruct x without noises Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 81

Getting Tangent Vectors I The code c represents a coordinate on a low dimensional manifold E.g., the blue line How to get the tangent vectors of a given c ? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 26 / 81

Getting Tangent Vectors II Recall: directions in the input space that changes c most should be tangent vectors Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 27 / 81

Getting Tangent Vectors II Recall: directions in the input space that changes c most should be tangent vectors Given a point x , let c be the code of x and J ( x ) = ∂ c ∂ x be the Jacobian matrix of c at x Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 27 / 81

Getting Tangent Vectors II Recall: directions in the input space that changes c most should be tangent vectors Given a point x , let c be the code of x and J ( x ) = ∂ c ∂ x be the Jacobian matrix of c at x J ( x ) summarizes how c changes in terms of x Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 27 / 81

Getting Tangent Vectors II Recall: directions in the input space that changes c most should be tangent vectors Given a point x , let c be the code of x and J ( x ) = ∂ c ∂ x be the Jacobian matrix of c at x J ( x ) summarizes how c changes in terms of x Decompose J ( x ) using SVD 1 such that J ( x ) = UDV ⊤ Let tangent vectors be rows of 2 V corresponding to the largest singular values in D Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 27 / 81

Getting Tangent Vectors III In practice, J ( x ) usually has few large singular values Tangent vectors found by contractive/denoising autoencoders: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 28 / 81

Getting Tangent Vectors III In practice, J ( x ) usually has few large singular values Tangent vectors found by contractive/denoising autoencoders: Can be used by Tangent Prop [23]: Let { v ( i , j ) } j be tangent vectors of each example x ( i ) Trains an NN classifier f with cost penalty: Ω [ f ] = ∑ i , j ∇ x f ( x ( i ) ) ⊤ v ( i , j ) Points in the same manifold share the same label Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 28 / 81

Outline Unsupervised Learning 1 Self-Supervised Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 The Basics Challenges More GANs Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 29 / 81

Decoder as Data Generator Decoder of an autoencoder can be used to generate data points even with synthetic codes Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 30 / 81

Decoder as Data Generator Decoder of an autoencoder can be used to generate data points even with synthetic codes Problems: Same c , same output Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 30 / 81

Decoder as Data Generator Decoder of an autoencoder can be used to generate data points even with synthetic codes Problems: Same c , same output → dropout layers, variational autoencoders [9] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 30 / 81

Decoder as Data Generator Decoder of an autoencoder can be used to generate data points even with synthetic codes Problems: Same c , same output → dropout layers, variational autoencoders [9] Blurry images Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 30 / 81

Why Blurry Images? Cost function: argmin Θ − logP ( X | Θ ) = argmin Θ − ∑ n logP ( x ( n ) | Θ ) Image generation: linear output units a ( L ) = z ( L ) = ˆ µ for x ∼ N ( µ , Σ ) − logP ( x ( n ) | Θ ) = � x ( n ) − a ( L ) � 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 81

Why Blurry Images? Cost function: argmin Θ − logP ( X | Θ ) = argmin Θ − ∑ n logP ( x ( n ) | Θ ) Image generation: linear output units a ( L ) = z ( L ) = ˆ µ for x ∼ N ( µ , Σ ) − logP ( x ( n ) | Θ ) = � x ( n ) − a ( L ) � 2 Better assuming distribution for x ? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 81

Why Blurry Images? Cost function: argmin Θ − logP ( X | Θ ) = argmin Θ − ∑ n logP ( x ( n ) | Θ ) Image generation: linear output units a ( L ) = z ( L ) = ˆ µ for x ∼ N ( µ , Σ ) − logP ( x ( n ) | Θ ) = � x ( n ) − a ( L ) � 2 Better assuming distribution for x ? P ( x ) may be very complex Better “goodness” measure? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 81

Why Blurry Images? Cost function: argmin Θ − logP ( X | Θ ) = argmin Θ − ∑ n logP ( x ( n ) | Θ ) Image generation: linear output units a ( L ) = z ( L ) = ˆ µ for x ∼ N ( µ , Σ ) − logP ( x ( n ) | Θ ) = � x ( n ) − a ( L ) � 2 Better assuming distribution for x ? P ( x ) may be very complex Better “goodness” measure? Why not use an NN to tell if a generated image is of good quality? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 81

Generative Adversarial Networks (GANs) Generative adversarial network ( GAN ) [4]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 81

Generative Adversarial Networks (GANs) Generative adversarial network ( GAN ) [4]: Generator g : to generate data points from random codes No need for “encoder” since the task is data synthesis Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 81

Generative Adversarial Networks (GANs) Generative adversarial network ( GAN ) [4]: Generator g : to generate data points from random codes No need for “encoder” since the task is data synthesis Discriminator f : to separate generated points from real ones Weights for x and ˆ x are tied A binary classifier with Sigmoid output unit a ( L ) = ˆ ρ for P ( y = true point | x ) ∼ Bernoulli ( ρ ) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 81

Generative Adversarial Networks (GANs) Generative adversarial network ( GAN ) [4]: Generator g : to generate data points from random codes No need for “encoder” since the task is data synthesis Discriminator f : to separate generated points from real ones Weights for x and ˆ x are tied A binary classifier with Sigmoid output unit a ( L ) = ˆ ρ for P ( y = true point | x ) ∼ Bernoulli ( ρ ) Goal: to train a g that tricks f into believing g ( c ) is real Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 81

Cost Function Given N real training points and N generated points: argmin Θ g max Θ f logP ( X | Θ g , Θ f ) = argmin Θ g max Θ f ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) ))) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 81

Cost Function Given N real training points and N generated points: argmin Θ g max Θ f logP ( X | Θ g , Θ f ) = argmin Θ g max Θ f ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) ))) ρ ( n ) + ∑ N = argmin Θ g max Θ f ∑ N ρ ( m ) ) n = 1 log ˆ m = 1 log ( 1 − ˆ Recall that f maximizes the log likelihood � ρ ( n ) ) ( 1 − y ( n ) ) � logP ( X | Θ ) ∝ ∑ n logP ( y ( n ) | x ( n ) , Θ ) = ∑ n log ρ ( n ) ) y ( n ) ( 1 − ˆ ( ˆ Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 81

Cost Function Given N real training points and N generated points: argmin Θ g max Θ f logP ( X | Θ g , Θ f ) = argmin Θ g max Θ f ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) ))) ρ ( n ) + ∑ N = argmin Θ g max Θ f ∑ N ρ ( m ) ) n = 1 log ˆ m = 1 log ( 1 − ˆ Recall that f maximizes the log likelihood � ρ ( n ) ) ( 1 − y ( n ) ) � logP ( X | Θ ) ∝ ∑ n logP ( y ( n ) | x ( n ) , Θ ) = ∑ n log ρ ( n ) ) y ( n ) ( 1 − ˆ ( ˆ Inner max first, then outer min Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 81

Cost Function Given N real training points and N generated points: argmin Θ g max Θ f logP ( X | Θ g , Θ f ) = argmin Θ g max Θ f ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) ))) ρ ( n ) + ∑ N = argmin Θ g max Θ f ∑ N ρ ( m ) ) n = 1 log ˆ m = 1 log ( 1 − ˆ Recall that f maximizes the log likelihood � ρ ( n ) ) ( 1 − y ( n ) ) � logP ( X | Θ ) ∝ ∑ n logP ( y ( n ) | x ( n ) , Θ ) = ∑ n log ρ ( n ) ) y ( n ) ( 1 − ˆ ( ˆ Inner max first, then outer min ρ ( n ) depends on Θ f only ˆ ρ ( m ) depends on both Θ f and Θ g ˆ Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 81

Training: Alternative SGD log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 81

Training: Alternative SGD log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Repeat K times (with fixed Θ g ): 1 Sample N real points { x ( n ) } n from X and N codes from c ∼ N ( 0 , I ) 1 Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 81

Training: Alternative SGD log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Repeat K times (with fixed Θ g ): 1 Sample N real points { x ( n ) } n from X and N codes from c ∼ N ( 0 , I ) 1 Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Execute once (with fixed Θ f ): 2 Sample N codes from c ∼ N ( 0 , I ) 1 Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 81

Training: Alternative SGD log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Repeat K times (with fixed Θ g ): 1 Sample N real points { x ( n ) } n from X and N codes from c ∼ N ( 0 , I ) 1 Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Execute once (with fixed Θ f ): 2 Sample N codes from c ∼ N ( 0 , I ) 1 Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Why limiting the steps ( K ) when updating Θ f ? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 81

Training: Alternative SGD log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Repeat K times (with fixed Θ g ): 1 Sample N real points { x ( n ) } n from X and N codes from c ∼ N ( 0 , I ) 1 Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Execute once (with fixed Θ f ): 2 Sample N codes from c ∼ N ( 0 , I ) 1 Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] 2 Why limiting the steps ( K ) when updating Θ f ? f may overfit data and give very different values once g is updated Limiting K so to prevent g from being updated for “wrong” target Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 81

Results Domain-specific architecture, e.g., DC-GAN [18] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 36 / 81

GANs Are Hard to Train! Tips for Training Stable GANs Keep Calm and train a GAN. Pitfalls and Tips... 10 Lessons I Learned Training GANs for one Year GAN hacks on GitHub Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 37 / 81

Challenge: Non-Convergence The GAN training may not converge Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 81

Challenge: Non-Convergence The GAN training may not converge The goal of GAN is to find a saddle point log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 81

Challenge: Non-Convergence The GAN training may not converge The goal of GAN is to find a saddle point log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m The updated Θ f and Θ g may cancel each other’s progress Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 81

Challenge: Non-Convergence The GAN training may not converge The goal of GAN is to find a saddle point log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m The updated Θ f and Θ g may cancel each other’s progress Requires human monitoring and termination Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 81

Mode Collapsing Even worse: mode collapsing g may oscillate from generating one kind of points to generating another kind of points Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 40 / 81

Mode Collapsing Even worse: mode collapsing g may oscillate from generating one kind of points to generating another kind of points When K is small, alternate SGD does not distinguish between min Θ g max Θ f and max Θ f min Θ g log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m max Θ f min Θ g ? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 40 / 81

Mode Collapsing Even worse: mode collapsing g may oscillate from generating one kind of points to generating another kind of points When K is small, alternate SGD does not distinguish between min Θ g max Θ f and max Θ f min Θ g log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m max Θ f min Θ g ? g is encouraged to map every code to the “mode” that f believes is most likely to be real Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 40 / 81

Solutions Minibatch discrimination [22] In max Θ f min Θ g case, g collapses because ∇ Θ f C are computed independently for each point x ( n ) with batch features ? Why not augment each x ( n ) / ˆ Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 41 / 81

Solutions Minibatch discrimination [22] In max Θ f min Θ g case, g collapses because ∇ Θ f C are computed independently for each point x ( n ) with batch features ? Why not augment each x ( n ) / ˆ If g collapses, f can tell this from batch features and reject fake points Now, g needs to generate dissimilar points to fool f Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 41 / 81

Solutions Minibatch discrimination [22] In max Θ f min Θ g case, g collapses because ∇ Θ f C are computed independently for each point x ( n ) with batch features ? Why not augment each x ( n ) / ˆ If g collapses, f can tell this from batch features and reject fake points Now, g needs to generate dissimilar points to fool f without with Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 41 / 81

Solutions Minibatch discrimination [22] In max Θ f min Θ g case, g collapses because ∇ Θ f C are computed independently for each point x ( n ) with batch features ? Why not augment each x ( n ) / ˆ If g collapses, f can tell this from batch features and reject fake points Now, g needs to generate dissimilar points to fool f without with Unrolled GANs [15]: to back-propagate through several max steps when computing ∇ Θ g C Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 41 / 81

Challenge: Balance between g and f log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Alternate SGD: Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] for K times Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 81

Challenge: Balance between g and f log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Alternate SGD: Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] for K times Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] Why limiting K when updating Θ f ? Too large K : Too small K : Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 81

Challenge: Balance between g and f log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Alternate SGD: Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] for K times Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] Why limiting K when updating Θ f ? Too large K : f may overfit data, making g updated for “wrong” target f Too small K : Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 81

Challenge: Balance between g and f log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Alternate SGD: Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] for K times Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] Why limiting K when updating Θ f ? Too large K : f may overfit data, making g updated for “wrong” target f Vanishing gradients: ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] too small to learn Too small K : Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 81

Challenge: Balance between g and f log f ( x ( n ) )+ ∑ log ( 1 − f ( g ( c ( m ) ))) Θ f ∑ argmin Θ g max n m Alternate SGD: Θ f ← Θ f + η ∇ Θ f [ ∑ n log f ( x ( n ) )+ ∑ m log ( 1 − f ( g ( c ( m ) )))] for K times Θ g ← Θ g − η ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] Why limiting K when updating Θ f ? Too large K : f may overfit data, making g updated for “wrong” target f Vanishing gradients: ∇ Θ g [ ∑ m log ( 1 − f ( g ( c ( m ) )))] too small to learn Too small K : g updated for “meaningless” f Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 81

Solution: Wasserstein GAN [1] Let f be a regressor without the sigmoid output layer Cost function: f ( x ( n ) ) − ∑ f ( g ( c ( m ) )) Θ f ∑ argmin Θ g max n m Initialize Θ g for g and Θ f for f At each SGD step/iteration: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 43 / 81

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 81 Outline Unsupervised

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Standard Young Tableaux Old and New Ron Adin and Yuval Roichman Department of Mathematics

Non Uniform Discrete Fourier Transform for AS recall adaptive acceleration of the NUDFT

Reporting Equivalent IPFIX Information Elements draft-aitken-ipfix-equivalent-ies-00 Paul Aitken

The Essentials of CAGD Chapter 5: Putting Curves to Work Gerald Farin & Dianne Hansford CRC

UKIRT: A British Success Story Rev I I ew Q U U KIRT at V Jim Hough Centre for Astrophysics

Quick Poll questions How familiar are you with Ansible? Yes/No Are you using IBM Power

cfg2html Cfg2html (or Config to HTML) project URL: http://www.cfg2html.com/ Main author: Ralph

A remark on Glasners problem Lionel Nguyen Van Th e Universit e dAix-Marseille

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 81 Outline Unsupervised

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Standard Young Tableaux Old and New Ron Adin and Yuval Roichman Department of Mathematics

Non Uniform Discrete Fourier Transform for AS recall adaptive acceleration of the NUDFT

Reporting Equivalent IPFIX Information Elements draft-aitken-ipfix-equivalent-ies-00 Paul Aitken

The Essentials of CAGD Chapter 5: Putting Curves to Work Gerald Farin &amp; Dianne Hansford CRC

UKIRT: A British Success Story Rev I I ew Q U U KIRT at V Jim Hough Centre for Astrophysics

Quick Poll questions How familiar are you with Ansible? Yes/No Are you using IBM Power

cfg2html Cfg2html (or Config to HTML) project URL: http://www.cfg2html.com/ Main author: Ralph

A remark on Glasners problem Lionel Nguyen Van Th e Universit e dAix-Marseille

The Essentials of CAGD Chapter 5: Putting Curves to Work Gerald Farin & Dianne Hansford CRC