gans final practice questions
play

GANs + Final practice questions Lecture 23 CS 753 Instructor: - PowerPoint PPT Presentation

GANs + Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi Final Exam Syllabus 1. WFST algorithms/WFSTs used in ASR 2. HMM algorithms/EM/Tied state Triphone models 3. DNN-based acoustic models 4. N-gram/Smoothing/RNN


  1. GANs + 
 Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi

  2. Final Exam Syllabus 1. WFST algorithms/WFSTs used in ASR 2. HMM algorithms/EM/Tied state Triphone models 3. DNN-based acoustic models 4. N-gram/Smoothing/RNN language models 5. End-to-end ASR (CTC, LAS, RNN-T) 6. MFCC feature extraction 7. Search & Decoding 8. HMM-based speech synthesis models 9. Multilingual ASR 10. Speaker Adaptation 11. Discriminative training of HMMs Questions can be asked on any of the 11 topics listed above. You will be allowed a single A-4 cheat sheet of handwritten notes ; content on both sides permitted.

  3. Final Project Deliverables 4-5 page final report: • ✓ Task definition, Methodology, Prior work, Implementation Details, Experimental Setup, Experiments and Discussion, Error Analysis (if any), Summary Short talk summarizing the project: • ✓ Each team will get 8-10 minutes for their presentation 
 ≈ and � 5 minutes for Q/A ✓ Clearly demarcate which team member worked on what part

  4. Final Project Grading Break-up of 20 points: • 6 points for the report • 4 points for the presentation • 6 points for Q/A • 4 points for overall evaluation of the project •

  5. Final Project Schedule Presentations will be held on Nov 23rd and Nov 24th • The final report in pdf format should be sent to • pjyothi@cse.iitb.ac.in before Nov 24th The order of presentations will be decided on a lottery basis • and shared via Moodle before Nov 9th

  6. Generative Adversarial Networks (GANs) D ( x ) Training process is formulated as a • game between a generator network Discriminator and a discriminative network Objective of the generator: Create • samples that seem to be from the same distribution as the training x real x = G ( z ) data Objective of the discriminator: • Generator Examine a generated sample and distinguish between fake or real samples • The generator tries to fool the discriminator network Z

  7. <latexit sha1_base64="7VI+8ZF3gmwRBJn+eQLkMiseN2U=">ACHicbVDLSsNAFJ3UV62vqEsXDhahgpSkCrosWqgLFxXsA5oQJtNpO3QyCTMTsYQu3fgrblwo4tZPcOfOGmz0OqBYQ7n3Mu9/gRo1JZ1peRW1hcWl7JrxbW1jc2t8ztnZYMY4FJE4csFB0fScIoJ01FSOdSBAU+Iy0/dFl6rfviJA05LdqHBE3QANO+xQjpSXP3HcCdO/VoRNQ7tX0h9QI5ZcT0r1Y1g78syiVbamgH+JnZEiyNDwzE+nF+I4IFxhqTs2lak3AQJRTEjk4ITSxIhPEID0tWUo4BIN5keMoGHWunBfij04wpO1Z8dCQqkHAe+rkwXlfNeKv7ndWPVP3cTyqNYEY5ng/oxgyqEaSqwRwXBio01QVhQvSvEQyQVjq7g7Bnj/5L2lVyvZJuXJzWqxeZHkwR4ACVgzNQBVegAZoAgwfwBF7Aq/FoPBtvxvusNGdkPbvgF4yPb3ZxmFI=</latexit> <latexit sha1_base64="VTwlHXywjOChz5LfJXs73m4kjK8=">ACU3icbVFda9RAFJ2kVetq7aqPvlxchATtklTBvghFV+qDxXctrAJy2T27u7YySTM3LS7DfmPIvjgH/HFB539QGzrgYHDOecyd85kpZKWouiH529s3rp9Z+tu69797Qc7YePjm1RGYF9UajCnGbcopIa+yRJ4WlpkOeZwpPs7N3CPzlHY2WhP9O8xDTnEy3HUnBy0rD9JSGcUX0xRYPQJzmgqu6o9NcPgCeiG8Wkmr983w3oGidTQa2Cwm6hiAr1gFqbw/Erm8q8bxLDrIofBZRiG6bDdibrREnCTxGvSYWscDdvfklEhqhw1CcWtHcRSWnNDUmhsGklcWSizM+wYGjmudo03rZSQPnDKCcWHc0QRL9d+JmufWzvPMJRe72+veQvyfN6hovJ/WUpcVoRari8aVAipgUTCMpEFBau4IF0a6XUFMueGC3De0XAnx9SfJMd73fhld+/Tq87B23UdW+wJe8oCFrPX7IB9YEeszwT7yn6y3x7zvnu/fN/fXEV9bz3zmF2Bv/0H1zCvmQ=</latexit> Generative Adversarial Networks max G min D L ( G, D ) where L ( G, D ) = E x ∈ D [ − log D ( x )] + E z [ − log(1 − D ( G ( z )))] • Cost function of the generator is the opposite of the discriminator’s • Minimax game: The generator and discriminator are playing a zero-sum game against each other

  8. Training Generative Adversarial Networks for number of training iterations do for k steps do • Sample minibatch of m noise samples { z (1) , . . . , z ( m ) } from noise prior p g ( z ) . • Sample minibatch of m examples { x (1) , . . . , x ( m ) } from data generating distribution p data ( x ) . • Update the discriminator by ascending its stochastic gradient: m 1 h ⇣ x ( i ) ⌘ ⇣ ⇣ ⇣ z ( i ) ⌘⌘⌘i X log D + log 1 � D G . r θ d m i =1 end for • Sample minibatch of m noise samples { z (1) , . . . , z ( m ) } from noise prior p g ( z ) . • Update the generator by descending its stochastic gradient: m 1 ⇣ ⇣ ⇣ z ( i ) ⌘⌘⌘ X log 1 � D G . r θ g m i =1 end for The gradient-based updates can use any standard gradient-based learning rule. We used momen- tum in our experiments. Image from [Goodfellow16]: https://arxiv.org/pdf/1701.00160.pdf

  9. <latexit sha1_base64="axrtjiZYFfZltVUN2PIU2Ef0eo=">ACKnicbVDLSgMxFM3UV62vqks3wSK0oGWmCroRqrbUhUgF+4CZoWTStA3NPEgyQjvM97jxV9x0oRS3fojpQ9DqgcDJOfdy7z1OwKiQuj7WEkvLK6tryfXUxubW9k56d68u/JBjUsM+83nTQYIw6pGapJKRZsAJch1Gk7/ZuI3ngX1Pce5SAgtou6Hu1QjKSWukry0WyhxGL7uJWNP1wN6qU7+M4WzmGpRy8hN9qWVUMY2ieWMzvwlK2kh3mcnYrndHz+hTwLzHmJAPmqLbSI6vt49AlnsQMCWEaeiDtCHFJMSNxygoFCRDuoy4xFfWQS4QdTU+N4ZFS2rDjc/U8Cafqz4IuUIMXEdVTrYWi95E/M8zQ9m5sCPqBaEkHp4N6oQMSh9OcoNtygmWbKAIwpyqXSHuIY6wVOmVAjG4sl/Sb2QN07zhYezTPF6HkcSHIBDkAUGOAdFcAuqoAYweAav4A28ay/aSBtrH7PShDbv2Qe/oH1+ASE1pcM=</latexit> <latexit sha1_base64="LGWrBEXN/oGBn/UGMdYvD/mje+0=">ACL3icbVDLSgMxFM34rPVdekmWIQpaJmpgm6Eoq1IVLBPqAdSibNtKGZB0lGaIf5Izf+Sjcirj1L0wfgrYeCJycy/3mMHjApGK/awuLS8spqYi25vrG5tZ3a2a0KP+SYVLDPfF63kSCMeqQiqWSkHnCXJuRmt27Gvm1R8IF9b0H2Q+I5aKORx2KkVRSK3XdJHsYsSi27gVjT/cjUrFuzjWS0ewkIEX8EctqopBDBtN5negbsJjWNBL+iCTyVitVNrIGmPAeWJOSRpMUW6lhs2j0OXeBIzJETDNAJpRYhLihmJk81QkADhHuqQhqIecomwovG9MTxUShs6PlfPk3Cs/u6IkCtE37V5Wh1MeuNxP+8RidcyuiXhBK4uHJICdkUPpwFB5sU06wZH1FEOZU7QpxF3GEpYo4qUIwZ0+eJ9Vc1jzJ5u5P0/nLaRwJsA8OgA5McAby4AaUQVg8ASG4A28a8/ai/ahfU5KF7Rpzx74A+3rG3KWprc=</latexit> Better objective for the generator Problem of saturation: If the • generated sample is really poor, the generator’s cost is relatively flat Original cost • L GEN ( G, D ) = E z [log(1 − D ( G ( z )))] Modified cost • L GEN ( G, D ) = E z [ − log D ( G ( z ))]

  10. Large (& growing!) list of GANs Image from https://github.com/hindupuravinash/the-gan-zoo

  11. D ( x ) Conditional GANs x real • Generator and discriminator x = G ( z ) receive some additional conditioning information C Z

  12. Image-to-image Translation using C-GANs { } Labels to Street Scene Labels to Facade BW to Color input output Aerial to Map input output input output Day to Night Edges to Photo input output input output input output Image from Isola et al., CVPR 2017, https://arxiv.org/pdf/1611.07004.pdf

  13. Text-to-Image Synthesis this small bird has a pink this magnificent fellow is breast and crown, and black almost all black with a red primaries and secondaries. crest, and white cheek patch. this white and yellow flower the flower has petals that are bright pinkish purple have thin white petals and a with white stigma round yellow stamen Image from Reed et al., ICML 2016, https://arxiv.org/pdf/1605.05396.pdf

  14. Text-to-Image Synthesis This flower has small, round violet This flower has small, round violet petals with a dark purple center petals with a dark purple center Generator Network Discriminator Network Image from Reed et al., ICML 2016, https://arxiv.org/pdf/1605.05396.pdf

  15. Three Speech Applications of GANs

  16. GANs for speech synthesis Generator Discriminator: • produces 
 synthesised speech Binary OR classifier which the Discriminator distinguishes from real speech Linguistic features Natural samples During synthesis, a • Generator: MSE AND random noise + linguistic features Noise Predicted samples generates speech Image from Yang et al., “SPSS using GANs”, 2017

  17. SEGAN: GANs for speech enhancement Enhancement: Given an input noisy • ˜ x signal � , we want to clean it to obtain an x enhanced signal � ˜ x z Generator G will take both � and � as • inputs; G is fully convolutional Image from https://arxiv.org/pdf/1703.09452.pdf

  18. Voice Conversion Using Cycle-GANs Image from https://arxiv.org/abs/1711.11293

  19. Practice Questions

  20. � 
 HMM 101 A water sample collected from Powai lake is either Clean or Polluted. However, this information is hidden from us and all we can observe is whether the water is muddy, clear, odorless or cloudy. We start at time step 1 in the Clean state. The HMM below models this problem. Let qt and Ot denote the state and observation at time step t, respectively. 0.2 O 2 a)What is P( � = clear)? 0.8 ∣ O 2 q 2 b)What is P( � = Clean � = clear)? Clean 
 Polluted 
 Pr(muddy) = 0.5 
 O 200 Pr(muddy) = 0.1 
 c)What is P( � = cloudy)? Pr(clear) = 0.1 Pr(clear) = 0.5 Pr(odorless) = 0.2 Pr(odorless) = 0.2 Pr(cloudy) = 0.2 Pr(cloudy) = 0.2 d)What’s the most likely sequence of states for the following observation O 1 O 2 sequence: { � = clear, � = clear, 
 0.8 O 3 O 4 O 5 = clear, � = clear, � = clear}? 0.2

  21. 
 HMM 101 Say that we are now given a modified HMM for the water samples as shown below. Initial probabilities and transition probabilities are shown next to the arcs. (Note: You do not need to use the Viterbi algorithm to answer the next two questions.) a) What is the most likely sequence of 0.9 0.9 states given a sequence of three observations: {muddy, muddy, muddy}? 0.1 0.01 0.99 Clean 
 Polluted 
 Pr(muddy) = 0.51 
 Pr(muddy) = 0.49 
 b) Say we observe a very long Pr(clear) = 0.49 Pr(clear) = 0.51 0.1 sequence of “muddy” (e.g. 10 million “muddy” in a row). What happens to the most likely state sequence then?

Recommend


More recommend