the art of predictive analytics more data same models
play

The Art of Predictive Analytics: More Data, Same Models [STUDY - PowerPoint PPT Presentation

The Art of Predictive Analytics: More Data, Same Models [STUDY SLIDES] Joseph Turian joseph@metaoptimize.com @turian 2012.02.02 MetaOptimize NOTE: These are the STUDY slides from my talk at the predictive analytics meetup:


  1. The Art of Predictive Analytics: More Data, Same Models [STUDY SLIDES] Joseph Turian joseph@metaoptimize.com @turian 2012.02.02 MetaOptimize

  2. NOTE: These are the STUDY slides from my talk at the predictive analytics meetup: http://bit.ly/xVLBuS I have removed some graphics, and added some text. Please email me any questions

  3. Who am I? Engineer with 20 yrs coding exp PhD 10 yrs exp: large-scale ML + NLP Founded MetaOptimize

  4. What is MetaOptimize? Consultancy + community on: Large-scale ML + NLP Well engineered solutions

  5. “Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective knowledge.” - @aria42 http://metaoptimize.com/qa/

  6. http://metaoptimize.com/qa/

  7. http://metaoptimize.com/qa/

  8. “A lot of expertise in machine learning is simply developing effective biases .” -Dan Melamed (quoted from memory)

  9. What's a good choice of learning rate for the second layer of this neural net on image patches? [intuition] 0.02! (Yoshua Bengio)

  10. Occam's Razor is a great example of ML intuition

  11. Without the aid of prejudice and custom I should not be able to find my way across the room. - William Hazlitt

  12. It's fun to be a geek

  13. Be an artist

  14. Be an artist

  15. How to build the world's biggest langid (langcat) model?

  16. + Vowpal Wabbit = Win

  17. How to build the world's biggest langid (langcat) model? SOLVED.

  18. The art of predictive analytics: 1) Know the data out there 2) Know the code out there 3) Intuition (bias)

  19. A lot of data with one feature correlated with the label

  20. Twitter sentiment analysis?

  21. “Distant supervision” (Go et al., 09) Awesome! RT @rupertgrintnet Harry Potter Marks Place in Film History http://bit.ly/Eusxi :) (Use emoticons as labels)

  22. Recipe: You know a lot about the problem Smart Priors

  23. You know a lot about the problem: Smart Priors Yarowsky (1995), WSD 1) One sense per collocation. 2) One sense per discourse.

  24. Recipe: You know a lot about the problem Create new features

  25. You know a lot about the problem: Create new features Error-analysis

  26. What errors is your model making? DO SOME EXPLORATORY DATA ANALYSIS (EDA)

  27. Andrew Ng: “Advice for applying ML” Where do the errors come from?

  28. Recipe: You know a little about the problem Semi-supervised learning

  29. You know a little about the problem: Semi-supervised learning JOINT semi-supervised learning Ando and Zhang (2005) Suzuki and Isozaki (2008) Suzuki et al. (2009), etc. => effective but task-specific

  30. You know a little about the problem: Semi-supervised learning Unsupervised learning, followed by Supervised learning

  31. How can Bob improve his model? Sup Sup data model Supervised training 34

  32. Semi-sup training? Sup Sup data model Supervised training 35

  33. Semi-sup training? More feats Sup Sup data model Supervised training 36

  34. More sup task 1 feats Sup Sup data model More features can be used on different tasks More sup task 2 feats Sup Sup data model 37

  35. Unsup Joint semi-sup data Semi-sup model Sup data (standard semi-sup setup) 38

  36. Unsup data Unsup model unsup pretraining Sup Semi-sup data model semi-sup fine-tuning 39 Unsupervised, then supervised

  37. Unsup data Unsup model unsup training unsup feats Use unsupervised learning to create new features 40

  38. Unsup data unsup feats unsup training These features can then be shared with other people Sup data Semi-sup model Sup training 41

  39. Unsup data unsup feats unsup training sup task 1 sup task 2 sup task 3 42

  40. Recipe: You know almost nothing about the problem Build cool generic features

  41. Know almost nothing about problem: Build cool generic features Word features (Turian et al., 2010) http://metaoptimize.com/projects/wordreprs/

  42. Brown clustering (Brown et al. 92) cluster(chairman) = `0010’ 2-prefix(cluster(chairman)) = `00’ 45 (image from Terry Koo)

  43. 50-dim embeddings: Collobert + Weston (2008) t-SNE vis by van der Maaten + Hinton (2008) 46

  44. Know almost nothing about problem: Build cool generic features Document features: Document clustering LSA/LDA Deep model

  45. Document features Salakhutdinov + Hinton 06

  46. Document features example Domain adaptation for sentiment analysis (Glorot et al. 11)

  47. Recipe: You know a little about the problem Make more REAL training examples

  48. Make more real training examples Cuz you have some time or a small budget Amazon Mechanical Turk

  49. Snow et al. 08 “Cheap and Fast – But is it Good?” 1K turk labels per dollar Average over (5) Turks to reduce noise => http://crowdflower.com/

  50. Soylent (Bernstein et al. 10) Find-Fix-Verify: Crowd control design pattern Find a Fix each Verify quality problem problem of each fix Soylent, a Soylent, a Soylent, a prototype... Soylent, a prototype... prototype... prototype...

  51. Make more real training examples Active learning

  52. Dualist (Settles 11) http://code.google.com/p/dualist/

  53. Dualist (Settles 11) http://code.google.com/p/dualist/ Applications: Document categorization WSD Information Extraction Twitter sentiment analysis

  54. You know a little about the problem: Make more training examples FAKE training examples

  55. NOISE

  56. FAKE training examples Denoising AA RBM

  57. MNIST distortions (LeCun et al. 98)

  58. No negative examples?

  59. FAKE training examples Multi-view / multi-modal

  60. Multi-view / multi-modal How do you evaluate an IR system, if you have no labels? See how good the title is at retrieving the body text.

  61. 2) KNOW THE DATA

  62. Know the data Labelled/structured data: ODP, Freebase, Wikipedia, Dbpedia, etc.

  63. Know the data Unlabelled data: WaCKy, ClueWeb09, CommonCrawl, Ngram corpora

  64. Ngrams Google Bing Google Books Roll your own: Common crawl

  65. Know the data Do something stupid on a lot of data

  66. Do something stupid on a lot of data: Ngrams Spell-checking Phrase segmentation Word breaking Synonyms Language models See “An Overview of Microsoft Web N-gram Corpus and Applications” (Wang et al 10)

  67. Do something stupid on a lot of data Web-scale k-means for NER (Lin and Wu 09)

  68. Do something stupid on a lot of data Web-scale clustering

  69. Know the data Multi-modal learning

  70. Multi-modal learning Images and captions = features features “facepalm”

  71. Multi-modal learning Titles and article body = features features Article body Title

  72. Multi-modal learning Audio and tags = features features “upbeat”, “hip hop”

  73. 3) IT'S MODELS ALL THE WAY DOWN

  74. Break down a pipeline 1-best (greedy), k-best, Finkel et al. 06

  75. Good code to build on Stanford NLP tools, clustering algorithms, Terry Koo's parser, etc.

  76. Good code to build on YOUR MODEL

  77. Eat your own dogfood Bootstrapping (Yarowsky 95) Co-training (Blum+Mitchell 98) EM (Nigam et al., 00) Self-training (McClosky et al., 06)

  78. Dualist (Settles '11) Active learning + semisup learning

  79. Eat your own dogfood Cheap bootstrapping: One step of EM (Settles 11) “Awesome! What a great movie!”

  80. It's models all the way down Use models to annotate Low recall + high precision + lots of data = win

  81. Use models to annotate Face modeling

  82. Pose-invariant face features

  83. Pose-invariant face features

  84. It's models all the way down THE FUTURE? Joins on large noisy data sets

  85. Joins on large noisy data sets ReVerb (Fader et al., 11) http://reverb.cs.washington.edu Extractions over entire ClueWeb09 (826 MB compressed)

  86. ReVerb (Fader et al., 11)

  87. Joins on noisy data sets (can clean up the data??) ??? ⋈

  88. The art of predictive analytics: 1) Know the data out there 2) Know the code out there 3) Intuition (bias)

  89. Summary of recipes: Know your problem Throw in good features Use other's good models in yr pipeline Make more training examples Use a lot of data

  90. "It especially annoys me when racists are accused of 'discrimination.' The ability to discriminate is a precious facility; by judging all members of one 'race' to be the same, the racist precisely shows himself incapable of discrimination." - Christopher Hitchens (RIP)

  91. Other cool research to look at: * Frustratingly easy domain adaptation (Daume 07) * The Unreasonable Effectiveness of Data (Halevy et al 09) * Web-scale algorithms (search on http://metaoptimize.com/qa/) * Self-taught learning (Raina et al 07)

  92. Please email me any questions Joseph Turian joseph@metaoptimize.com @turian http://metaoptimize.com/qa/ 2012.02.02

Recommend


More recommend