new developments in large data
play

New Developments in Large Data that have Immediate Application in - PowerPoint PPT Presentation

New Developments in Large Data that have Immediate Application in Industry (but you havent heard of yet) Joseph Turian @turian #strataconf MetaOptimize perhaps you should close your laptops How do you get a competitive advantage with


  1. New Developments in Large Data that have Immediate Application in Industry (but you haven’t heard of yet) Joseph Turian @turian #strataconf MetaOptimize

  2. perhaps you should close your laptops

  3. How do you get a competitive advantage with data?

  4. How do you get a competitive advantage with data? • More data

  5. How do you get a competitive advantage with data? • More data • Better algorithms

  6. When big data gives diminishing returns, you need better algorithms

  7. When big data gives diminishing returns, you need better algorithms @turian #strataconf

  8. When should you use better algorithms?

  9. When should you use better algorithms? • If they are really cool algorithms

  10. When should you use better algorithms? • If they are really cool algorithms

  11. When should you use better algorithms? • If they are really cool algorithms • If you have a lot of time on your hands

  12. When should you use better algorithms? • If they are really cool algorithms • If you have a lot of time on your hands

  13. Only use better algorithms if they will qualitatively improve your product

  14. Only use better algorithms if they will qualitatively improve your product @turian #strataconf

  15. Who am I?

  16. Who am I? • Engineer with 20 years coding experience • Ph.D. 10 yrs exp in large-scale ML + NLP

  17. What is MetaOptimize?

  18. What is MetaOptimize? optimizing the process of

  19. What is MetaOptimize? optimizing the process of optimizing the process of

  20. What is MetaOptimize? optimizing the process of optimizing the process of optimizing the process of optimizing the process of optimizing the process of optimizing the process of optimizing the process of optimizing the process of optimizing the process

  21. What is MetaOptimize? • Consultancy on: • Large scale ML + NLP • Well-engineered solutions

  22. “Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective knowledge.” - @aria42 http://metaoptimize.com/qa/

  23. Outline • Deep Learning – Semantic Hashing • Graph parallelism • Unsupervised semantic parsing

  24. Outline • Deep Learning – Semantic Hashing • Graph parallelism • Unsupervised semantic parsing

  25. Opportunity with Deep Learning • Machine learning that’s – Large-scale (>1B examples) – Can use all sorts of data – General purpose – Highly accurate

  26. Deep Learning

  27. Deep Learning • Artificial intelligence???

  28. Natural Intelligence

  29. Natural Intelligence Works!

  30. Artificial Intelligence

  31. Artificial Intelligence • Still far from the goal! • Why?

  32. Where does intelligence come from?

  33. Intelligence comes from knowledge

  34. How can a machine get knowledge? Human input

  35. NO!

  36. Intelligence comes from knowledge. Knowledge comes from learning.

  37. Intelligence comes from knowledge. Knowledge comes from learning. @turian #strataconf

  38. Statistical Learning • New multi- disciplinary field • Numerous applications

  39. Memorize? Generalize? or • Mathematically: • Easy for machines fundamentally difficult • Harder for humans • Easier for humans

  40. How do we build a learning machine?

  41. Deep learning architecture Output: is bob? … Highest-level features: Faces … Abstract features: … Shapes Primitive features: … Edges Input: Raw pixels …

  42. Shallow learning architecture … … …

  43. Why deep architectures?

  44. subsubsub2 subsubsub1 subsubsub3 subsub1 subsub2 subsub3 sub1 sub2 sub3 main “Deep” computer program

  45. subroutine1 includes subroutine2 includes subsub1 code and subsub2 code and subsub2 code and subsub3 code and subsubsub1 code subsubsub3 code and … main “Shallow” computer program

  46. “Deep” circuit

  47. “Shallow” circuit output … 2 n … 1 2 3 n input

  48. Insufficient Depth Sufficient cient depth th = Insuffi sufficient ient depth th = Comp ompact act re repre resenta sentation tion May y re requ quire ire expo pone nenti ntial al-si size e arc rchitec hitectur ture … … … … 2 n 1 2 3 … O(n) … … 1 2 3 n 1 2 3 n

  49. What’s wrong with a fat architecture?

  50. Overfitting! bad generalization

  51. Occam’s Razor

  52. Other motivations for deep architectures?

  53. Learning Brains • 10 11 neurons, 10 14 synapses • Complex neural network • Learning: modify synapses

  54. Visual System

  55. Deep Architecture in the Brain Higher level visual Area V4 abstractions Primitive shape Area V2 detectors Area V1 Edge detectors pixels Retina

  56. Deep architectures are Awesome!!! • Because they’re compact but…

  57. Why not deep architectures? • How do we train them?

  58. Before 2006 Failure of deep architectures

  59. Mid 2006 Breakthrough!

  60. Signal-to-noise ratio • More signal!

  61. Deep training tricks • Unsupervised learning

  62. Deep training tricks • Create one layer of features at a time

  63. Bengio Montréal Toronto Hinton Le Cun New York

  64. (I did my postdoc here) Bengio Montréal Toronto Hinton Le Cun New York

  65. Deep learning a success! Since 2006 Deep learning breaks records in: • Handwritten character recognition • Component of winning NetFlix entry • Language modeling Interest in deep learning: • NSF and DARPA

  66. Opportunity with Deep Learning • Machine learning that’s – Large-scale (>1B examples) – Can use all sorts of data – General purpose – Highly accurate

  67. Outline • Deep Learning – Semantic Hashing • Graph parallelism • Unsupervised semantic parsing

  68. Opportunity with Semantic Hashing • Fast semantic search

  69. What’s wrong with keyword search?

  70. Keyword search • Search for tweets on “Hadoop”

  71. Keyword search • Search for tweets on “Hadoop” • Misses the following tweets: – “Just started using HBase” – “I really like Amazon Elastic Map - Reduce”

  72. What’s wrong with keyword search?

  73. What’s wrong with keyword search? Misses relevant results!

  74. Standard search: Inverted Index

  75. Hashing • Another technique for search

  76. Hashing • FAST!

  77. Hashing • Compact! • Without hashing: – Billions of images => 40 TB • With 64-bit hashing: – Billions of images => 8GB

  78. “Dumb” hashing • Typically no learning, not data-driven • Examples: – Random Projections – Count-Min Sketch – Bloom filters – Locality Sensitive Hashing

  79. “Smart” Hashing • As fast as “dumb” hashing • Data-driven • Examples: – Semantic Hashing (2007) – Kulis (2009) – Kumar, Wang, Chang (2010) – Etc.

  80. Semantic Hashing = ??

  81. Semantic Hashing = Smart hashing + deep learning Salakhutdinov + Hinton (2007)

  82. Semantic Hashing architecture

  83. Semantic Hashing architecture LSA/LSI, LDA TF*IDF

  84. Opportunity with Semantic Hashing Semantic search that is: • General purpose • Fast • Compact

  85. Opportunity with Semantic Hashing Semantic search that is: • General purpose – Search text, images, videos, audio, etc. • Fast • Compact

  86. Opportunity with Semantic Hashing Semantic search that is: • General purpose • Fast – Indexing: few weeks for 1B docs, using 100 cores – Retrieval: 3.6 ms for 1 million docs, scales sublinearly • Compact

  87. Opportunity with Semantic Hashing Semantic search that is: • General purpose • Fast • Compact – 1B docs, 30-bit hashes => 4GB – 1B images, 64-bit hashes => 8GB (vs. 40 TB naïve)

  88. Prediction Smart hashing will revolutionize search

  89. Prediction Smart hashing will revolutionize search @turian #strataconf

  90. Outline • Deep Learning – Semantic Hashing • Graph parallelism • Unsupervised semantic parsing

  91. The rise of Graph stores • Neo4J, HyperGraphDB, InfiniteGraph, InfoGrid, AllegroGraph, sones, DEX, FlockDB, OrientDB, VertexDB

  92. Opportunity with graph-based parallelism • Scale sophisticated ML algorithms • Larger data sets • Higher accuracy

  93. Useful machine learning algorithms • Gibbs sampling • Matrix factorization • EM • Lasso • Etc. Have graph-like data dependencies

  94. Machine learning in Map-Reduce

  95. Machine learning in Map-Reduce

  96. Machine learning in Map-Reduce Map-Abuse -Carlos Guestrin

  97. There are too many graph-like dependencies in many ML algorithms

  98. Parallel abstractions for graph operations • Pregel (Malewicz et al, 2009, 2010) – Erlang implementation called Phoebus • GraphLab (Low et al, 2010) – Source code available

Recommend


More recommend