Some Success Stories in Bridging Theory and Practice Anima - PowerPoint PPT Presentation

Some Success Stories in Bridging Theory and Practice Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA

SIGNSGD: COMPRESSED OPTIMIZATION FOR NON-CONVEX PROBLEMS JEREMY BERNSTEIN, JIAWEI ZHAO, KAMYAR AZZIZADENESHELI, YU-XIANG WANG, ANIMA ANANDKUMAR

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION Parameter server GPU 1 GPU 2 With 1/2 data With 1/2 data

DISTRIBUTED TRAINING INVOLVES COMPUTATION & COMMUNICATION Compress? Compress? Parameter server Compress? GPU 1 GPU 2 With 1/2 data With 1/2 data

DISTRIBUTED TRAINING BY MAJORITY VOTE sign(g) GPU 1 Parameter sign(g) GPU 2 server GPU 3 sign(g) sign [sum(sign(g))] GPU 1 Parameter GPU 2 server GPU 3

⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ LARGE-BATCH ANALYSIS SINGLE WORKER RESULTS SINGLE WORKER RESULTS Assumptions Assumptions Define Define f * f * ➤ Objective function lower bound ➤ Objective function lower bound ➤ Number of iterations ➤ Number of iterations K K ➤ Coordinate-wise variance bound ➤ Coordinate-wise variance bound ➤ Number of backpropagations ➤ Number of backpropagations N N σ σ ➤ Coordinate-wise gradient Lipschitz ➤ Coordinate-wise gradient Lipschitz L L 피 [ 피 [ 2 ] ≤ 2 ] ≤ N [ 2 ∥ N [ 2 ∥ 2 ] 2 ] K − 1 K − 1 1 1 1 1 ∑ ∑ SGD gets rate SGD gets rate ∥ g k ∥ 2 ∥ g k ∥ 2 σ ∥ 2 σ ∥ 2 L ∥ ∞ ( f 0 − f * ) + ∥ L ∥ ∞ ( f 0 − f * ) + ∥ K K k =0 k =0 피 [ 피 [ 2 2 2 2 L ∥ 1 ( f 0 − f * + 1 L ∥ 1 ( f 0 − f * + 1 ∥ g k ∥ 1 ] ∥ g k ∥ 1 ] N [ N [ σ ∥ 1 ] σ ∥ 1 ] K − 1 K − 1 2 ) + 2 ∥ 2 ) + 2 ∥ 1 1 1 1 signSGD gets rate signSGD gets rate ∑ ∑ ≤ ≤ ∥ ∥ d ∥ σ ∥ 2 d ∥ g k ∥ 2 d ∥ L ∥ ∞ K K k =0 k =0

VECTOR DENSITY & ITS RELEVANCE IN DEEP LEARNING A sparse vector A dense vector Natural measure of density =1 for fully dense v ≈ 0 for fully sparse v Fully dense vector……………….a sign vector 7

DISTRIBUTED SIGNSGD: MAJORITY VOTE THEORY If gradients are unimodal and symmetric… …reasonable by central limit theorem… …majority vote with M workers converges at rate: Same variance reduction as SGD

MINI-BATCH ANALYSIS Under symmetric noise assumption:

CIFAR-10 SNR

SIGNSGD PROVIDES “FREE LUNCH" P3.2x machines on AWS, Resnet50 on imagenet Throughput gain with only tiny accuracy loss

SIGNSGD: TIME PER EPOCH

SIGNSGD ACROSS DOMAINS AND ARCHITECTURES Huge throughput gain!

BYZANTINE FAULT TOLERANCE Under symmetric noise assumption:

SIGNSGD IS ALSO BYZANTINE FAULT TOLERANT

TAKE-AWAYS FOR SIGN-SGD • Convergence even under biased gradients and noise. • Faster than SGD in theory and in practice. • For distributed training, similar variance reduction as SGD. • In practice, similar accuracy but with far less communication.

LEARNING FROM NOISY SINGLY-LABELED DATA ASHISH KHETAN, ZACHARY C. LIPTON, ANIMA ANANDKUMAR

CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS Majority rule • Simple and common. • Wasteful: ignores annotator quality of different workers.

CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS Majority rule • Simple and common. • Wasteful: ignores annotator quality of different workers. Annotator-quality models • Can improve accuracy. • Hard: needs to be estimated without ground-truth.

SOME INTUITIONS Majority rule to estimate annotator quality • Justification: Majority rule approaches ground-truth when enough workers. • Downside: Requires large number of annotations for each example for majority rule to be correct. Annotator quality model (Prob. of correctness)

PROPOSED CROWDSOURCING ALGORITHM Noisy crowdsourced annotations Repeat Posterior of ground-truth labels given annotator quality model Training with weighted loss. Use posterior as weights MLE : update Annotator quality using inferred Use trained model to infer labels from model ground-truth labels

LABELING ONCE IS OPTIMAL: THEORY Theorem: Under fixed budget, generalization error minimized with single annotation per sample. Assumptions: • Best predictor is accurate enough (under no label noise). • Simplified case: All workers have same quality. • Prob. of being correct > 83%

LABELING ONCE IS OPTIMAL: PRACTICE MS-COCO dataset. Imagenet dataset. Fixed budget: 35k annotations Simulated workers and fixed budget 5% wrt Majority rule No. of workers

NEURAL RENDERING MODEL (NRM): JOINT GENERATION AND PREDICTION FOR SEMI-SUPERVISED LEARNING Nhat Ho, Tan Nguyen, Ankit Patel, A. , Michael Jordan, Richard Baraniuk

SEMI-SUPERVISED LEARNING WITH GENERATIVE MODELS? GAN Merits Peril • Captures statistics of • Feedback is real vs. fake: different from prediction. natural images • Introduces artifacts • Learnable

PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y One model to do both? x x • SOTA prediction from CNN models. • What class of p(x|y) yield CNN models for p(y|x)?

<latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> NEURAL DEEP RENDERING MODEL (NRM) object y category Design joint priors for latent latent variables based on . intermediate . variables . reverse-engineering CNN rendering predictive architectures . . . image x

Some Success Stories in Bridging Theory and Practice Anima - PowerPoint PPT Presentation

Some Success Stories in Bridging Theory and Practice Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA SIGNSGD: COMPRESSED OPTIMIZATION FOR NON-CONVEX PROBLEMS JEREMY BERNSTEIN, JIAWEI ZHAO, KAMYAR AZZIZADENESHELI,

L-102.00 1 Building Outline Map L-102.00 SCALE: 1" = 10' Date: 7/17/2017 Date:

Rosalind Dibley: Digital Stories and Me Sharing stories Information about how to tell stories is

Conservation Education, Communication and Outreach Success Stories: Bridging the Gap Between

Success Stories: Port Security Success Stories: Port Security A Presentation to the OAS Annual CIP

Success Stories Success Stories Builder / Manufacturer Partnerships Voices from Behind the

Data and potatoes (some drafts of stories) NomenclatureS Stories that we could have told

West 108th Street Development West Side Federation for Senior & Supportive Housing Dattner

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

Research on Race Bridging for 2020 Ben Bolender Assistant Division Chief Population Estimates

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

Edmund Coleman-Fountain Janice McLaughlin Stories and Practices Relationship between practices

Delivery II + Truth, Beauty, and Stories Telling Stories with Data December 13, 2017 Plan for

Context | Contrast, Repetition | Typography Telling Stories with Data October 30, 2017 Plan

Future Scenarios for the Waikato Scenarios Four stories about the future These are not

CONTEXT SENSITIVE UTILITIES SUCCESS STORIES PRESENTED BY: DISTRICT 3: Joseph Plunk, Stewart

Kildare Export Success Seminar Kilian Duignan Export Success Seminar Export Success Seminar

the extensor tendons extensor tendon mallet finger image credit: James Heilman, MD on wikimedia

Machine Learning Basics Classification & Text Categorization Features Overfitting

Bumper Cars Bumper Cars yourself to the center of the merry yourself to the center of the merry-

A Starter Activity Design Process to Deepen Students Understanding of Outcome-related

Programmers View of Internet Programmers View of Internet CS 105 Tour of the Black

Final review LING572 Advanced Statistical Methods for NLP March 12, 2020 1 Topics covered

Semantic annotation of unstructured and ungrammatical text Matthew Michelson & Craig A.

Methods/Software as Standards e.g., LDA Lead: All Participants: Andre Skupin, Margaret

Some Success Stories in Bridging Theory and Practice Anima - PowerPoint PPT Presentation

Some Success Stories in Bridging Theory and Practice Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA SIGNSGD: COMPRESSED OPTIMIZATION FOR NON-CONVEX PROBLEMS JEREMY BERNSTEIN, JIAWEI ZHAO, KAMYAR AZZIZADENESHELI,

L-102.00 1 Building Outline Map L-102.00 SCALE: 1&quot; = 10' Date: 7/17/2017 Date:

Rosalind Dibley: Digital Stories and Me Sharing stories Information about how to tell stories is

Conservation Education, Communication and Outreach Success Stories: Bridging the Gap Between

Success Stories: Port Security Success Stories: Port Security A Presentation to the OAS Annual CIP

Success Stories Success Stories Builder / Manufacturer Partnerships Voices from Behind the

Data and potatoes (some drafts of stories) NomenclatureS Stories that we could have told

West 108th Street Development West Side Federation for Senior &amp; Supportive Housing Dattner

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

Research on Race Bridging for 2020 Ben Bolender Assistant Division Chief Population Estimates

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

Edmund Coleman-Fountain Janice McLaughlin Stories and Practices Relationship between practices

Delivery II + Truth, Beauty, and Stories Telling Stories with Data December 13, 2017 Plan for

Context | Contrast, Repetition | Typography Telling Stories with Data October 30, 2017 Plan

Future Scenarios for the Waikato Scenarios Four stories about the future These are not

CONTEXT SENSITIVE UTILITIES SUCCESS STORIES PRESENTED BY: DISTRICT 3: Joseph Plunk, Stewart

Kildare Export Success Seminar Kilian Duignan Export Success Seminar Export Success Seminar

the extensor tendons extensor tendon mallet finger image credit: James Heilman, MD on wikimedia

Machine Learning Basics Classification &amp; Text Categorization Features Overfitting

Bumper Cars Bumper Cars yourself to the center of the merry yourself to the center of the merry-

A Starter Activity Design Process to Deepen Students Understanding of Outcome-related

Programmers View of Internet Programmers View of Internet CS 105 Tour of the Black

Final review LING572 Advanced Statistical Methods for NLP March 12, 2020 1 Topics covered

Semantic annotation of unstructured and ungrammatical text Matthew Michelson &amp; Craig A.

Methods/Software as Standards e.g., LDA Lead: All Participants: Andre Skupin, Margaret

L-102.00 1 Building Outline Map L-102.00 SCALE: 1" = 10' Date: 7/17/2017 Date:

West 108th Street Development West Side Federation for Senior & Supportive Housing Dattner

Machine Learning Basics Classification & Text Categorization Features Overfitting

Semantic annotation of unstructured and ungrammatical text Matthew Michelson & Craig A.