in deep learning
play

In Deep Learning Anima Anandkumar & Zachary Lipton DATA - PowerPoint PPT Presentation

Addressing Data Scarcity In Deep Learning Anima Anandkumar & Zachary Lipton DATA AUGMENTATION To improve generalization, augment training data. In computer vision. Simple techniques: rotation, cropping, noise In speech


  1. Addressing Data Scarcity In Deep Learning Anima Anandkumar & Zachary Lipton

  2. DATA AUGMENTATION • To improve generalization, augment training data. • In computer vision. Simple techniques: rotation, cropping, noise • In speech recognition: Additive background noise and spectral transform • More sophisticated approaches ?

  3. PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y x x • Impressive gains with deep learning • Far more challenging • Information loss (domain of y << x) • Need to model latent variations

  4. DATA AUGMENTATION 1: MIXED REALITY GAN Merits GAN Peril • Captures statistics of • Quality of generated images not high natural images • Introduces artifacts • Learnable Synthetic Data Peril Merits • High-quality rendering • Domain mis-match • Rendering for visual • Full annotation for free appeal and not • Generate infinite data classification Our GAN-based framework – Mr.GANs – narrows gap between synthetic and real data

  5. 
 MIXED-REALITY 
 GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 
 Tan Nguyen, Hao Chen, Zachary Lipton, Leo Dirac, Stefano Soatto, A.

  6. 
 
 MIXED-REALITY GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 
 • Two domains X and Y. • CycleGAN: Transforms from domain X to Y and viceversa. • Enforcing Cycle consistency: F(G(X)) ~ X. • MR-GAN: Progressive CycleGAN

  7. 
 
 MIXED-REALITY GENERATIVE ADVERSARIAL NETWORKS (MR-GAN) 


  8. CLASSIFICATION RESULTS ON CIFAR-DA4 1million synthetic from 3D models. 0.25% real from CIFAR100. 4 classes. Improvement over training on real data: • Real + Synthetic: 5.43% (Stage 0) • Real + CycleGAN: 5.09% (Stage 1) • Real + Mr.GAN: • 8.85% (Stage 2)

  9. THE REAL, THE SYNTHETIC AND THE REFINED Real Synthetic Refined Synthetic Refined Real Mr. GAN pushes both real and synthetic images closer to one another

  10. (R-S): SYNTHETIC -> REFINED SYNTHETIC Synthetic Refined Synthetic

  11. (R-S): REAL -> REFINED REAL Real Refined Real

  12. PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y One model to do both? x x • SOTA prediction from CNN models. • What class of p(x|y) yield CNN models for p(y|x)?

  13. 
 LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM) 
 Nhat Ho, Tan Nguyen, Ankit Patel, A. , Michael Jordan, Richard Baraniuk

  14. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM) object category Design joint priors for latent latent variables based on intermediate variables reverse-engineering CNN rendering predictive architectures image

  15. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  16. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  17. LATENT-DEPENDENT DEEP RENDERING MODEL (LD-DRM)

  18. STATISTICAL GUARANTEES FOR THE LD-DRM Training loss in the CNNs equivalent to likelihood in LD-DRM • Generalization for prediction depends on the generative model • Better generalization when no. of active rendering paths minimized • Rendering path normalization: new form of regularization • Improves performance significantly.

  19. SEMI-SUPERVISED LEARNING RESULTS Error rate percentage on CIFAR-10 Error rate percentage on CIFAR-100 LD-DRM achieves comparable results to state-of-the-art SSL methods

  20. DATA AUGMENTATION 3: SYMBOLIC EXPRESSIONS Goal: Learn a domain of functions (sin, cos, log, add…) • Training on numerical input-output does not generalize. Data Augmentation with Symbolic Expressions • Efficiently encode relationships between functions. Solution: • Design networks to use both: symbolic + numerical

  21. COMMON STRUCTURE: TREES • Symbolic expression trees. Function evaluation tree. • Decimal trees: encode numbers with decimal representation (numerical). • Can encode any expression, function evaluation and number.

  22. STRUCTURE : TREE LSTM

  23. RESULTS: EQUATION COMPLETION & FUNCTION EVAL

  24. RESULTS: EQUATION VERIFICATION Generalization to unseen depth

  25. RESULTS SUMMARIZED • Vastly Improved numerical evaluation: 90% over function-fitting baseline. • Generalization to verifying symbolic equations of higher depth LSTM: Symbolic TreeLSTM: Symbolic TreeLSTM: symbolic + numeric 76.40 % 93.27 % 96.17 % • Combining symbolic + numerical data helps in better generalization for both tasks: symbolic and numerical evaluation.

  26. 
 CONCLUSION Data scarcity needs to be addressed in a number of ways • Collection: Active learning and partial feedback • Aggregation: Crowdsourcing models • Augmentation: • Graphics rendering + GANs • Semi-supervised learning • Symbolic expressions

  27. Thank you

Recommend


More recommend