gpt3
play

GPT3 - AtishyaJain Thecontent of this presentation has beensourced - PowerPoint PPT Presentation

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper Let s Dissectit Let s Dissectit Let s Dissectit Mergeaa Mergeaa Mergeab Mergeaa Mergeab MergeZY Let


  1. GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos andblogs apartfromtheoriginal paper

  2. Let ’ s Dissectit

  3. Let ’ s Dissectit

  4. Let ’ s Dissectit

  5. Mergeaa

  6. Mergeaa Mergeab

  7. Mergeaa Mergeab MergeZY

  8. Let ’ s Dissectit

  9. BERT uses GPT uses Encoder part Decoder part only only

  10. Architecture

  11. Architecture

  12. Architecture

  13. Let ’ s Dissectit

  14. 355 Yearson fastestV100 $4,600,000 On lowest GPU cloud provider

  15. Let ’ s understand Few ShotLearning

  16. Zero Shot Learning There is a Dairy Cow

  17. Zero Shot Learning There is a Horse

  18. Zero Shot Learning Zebra is a horse with Dairy Cow ’ s color

  19. Zero Shot Learning You are Dad, Its a better than a Zebra CNN !!

  20. One Shot Learning There is a Monkey

  21. One Shot Learning You are Dad, Its a better than a Monkey CNN !!

  22. Few Shot Learning There is a Dog

  23. Few Shot Learning There is another Dog

  24. Few Shot Learning You are Dad, Its a better than a Dog CNN !!

  25. Few Shot Learning

  26. Few Shot Learning

  27. Few Shot Learning

  28. Compute Power

  29. Transformer Variants

  30. Training Dataset

  31. Training Dataset - Filtering

  32. Training Dataset - Filtering - Fuzzy Deduplication

  33. Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset

  34. Training Dataset - Filtering - Fuzzy Deduplication - Adding high quality dataset - Overlapping Test Set

  35. Evaluations

  36. Language Modelling - SOTA on PTB - Omit the 4 Wikipedia-related tasks and one-billion word benchmark

  37. LAMBDA

  38. TriviaQA

  39. Translation

  40. Synthetic and Qualitative Tasks - Arithmetic - Word Scrambling and Manipulation - SAT Analogies - News Article Generation - Learning and Using NovelWords - Correcting English Grammar

  41. Arithmetic

  42. Word Scramble and Manipulation

  43. News Generation

  44. Limitations - Lowperformanceinsome NLPtasks Starts to lose coherenceoversufficientlylarge passages - Special difficulty with “common sense physics” like “If I putcheese - infridge,will it melt ?” Architecturaldrawbackis doesn’t have bidirectionalinfo and - denoisingobjectives

  45. Limitations - Poor sampleefficiency Ambiguityon fewshot learninglearns task fromscratch ? - Difficult inferencing, hugemodel - Lack of structuredknowledge -

  46. Fairness and Bias

  47. Fairness and Bias Race

  48. Fairness and Bias Race Religion

  49. Demos

  50. GPT3 : Demos

  51. GPT3 : Interaction with your own AR bot https://twitter.com/i/status/1294380308209508359

  52. GPT3 : Animate Your Maths From English https://twitter.com/i/status/1294652394739912704

  53. GPT3 : Building aWebsite https://youtu.be/LOhIS7kiKvM

  54. GPT3 : Context BasedDictionary https://twitter.com/i/status/1294631853224206339

  55. GPT3 : Describe YourDesign

  56. W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul)

  57. W eaknesses - Fails miserably on reasoning tasks, so in essence, GPT-3 is not a very good reasoning module at all (Vipul) - No saturation yet (Vipul) - In zero-shot or one-shot, choice of words for task description in context learning can introduce variance (Shantanu) - Limited context window of 2048 (Shantanu)

  58. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu)

  59. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu)

  60. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu)

  61. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu)

  62. E xtensions - A bidirectional model with similar size and experiments (Vipul, Shantanu) - Explainable few-shot learning and analysis to see if GTP-3 is actually learning (Vipul, Shantanu) - A distilled version of GPT-3 (Shantanu) - Limited context window of 2048 (Shantanu) - Adversarial experiments to tweak the training samples articulately and present the adversarial examples to it at test time for inference. (Vipul)

  63. Thankyou

  64. R eferences - https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a - https://www.youtube.com/watch?v=SY5PvZrJhLE - https://jalammar.github.io/how-gpt3-works-visualizations-animations/ - https://www.youtube.com/watch?v=8psgEDhT1MM&vl=en - https://www.youtube.com/watch?v=7qPDwsCLbZc&t=3959s - Language Models are Few-Shot Learners (Brown et. al) - https://www.youtube.com/watch?v=Mq97CF02sRY

Recommend


More recommend