New Developments in Large Data that have Immediate Application in - - PowerPoint PPT Presentation

new developments in large data
SMART_READER_LITE
LIVE PREVIEW

New Developments in Large Data that have Immediate Application in - - PowerPoint PPT Presentation

New Developments in Large Data that have Immediate Application in Industry (but you havent heard of yet) Joseph Turian @turian #strataconf MetaOptimize perhaps you should close your laptops How do you get a competitive advantage with


slide-1
SLIDE 1

New Developments in Large Data that have Immediate Application in Industry

(but you haven’t heard of yet) Joseph Turian @turian MetaOptimize

#strataconf

slide-2
SLIDE 2

perhaps you should close your laptops

slide-3
SLIDE 3

How do you get a competitive advantage with data?

slide-4
SLIDE 4

How do you get a competitive advantage with data?

  • More data
slide-5
SLIDE 5

How do you get a competitive advantage with data?

  • More data
  • Better algorithms
slide-6
SLIDE 6

When big data gives diminishing returns, you need better algorithms

slide-7
SLIDE 7

When big data gives diminishing returns, you need better algorithms

@turian #strataconf

slide-8
SLIDE 8

When should you use better algorithms?

slide-9
SLIDE 9

When should you use better algorithms?

  • If they are really cool algorithms
slide-10
SLIDE 10

When should you use better algorithms?

  • If they are really cool algorithms
slide-11
SLIDE 11

When should you use better algorithms?

  • If they are really cool algorithms
  • If you have a lot of time on your hands
slide-12
SLIDE 12

When should you use better algorithms?

  • If they are really cool algorithms
  • If you have a lot of time on your hands
slide-13
SLIDE 13

Only use better algorithms if they will qualitatively improve your product

slide-14
SLIDE 14

Only use better algorithms if they will qualitatively improve your product

@turian #strataconf

slide-15
SLIDE 15

Who am I?

slide-16
SLIDE 16

Who am I?

  • Engineer with 20 years coding experience
  • Ph.D. 10 yrs exp in large-scale ML + NLP
slide-17
SLIDE 17

What is MetaOptimize?

slide-18
SLIDE 18

What is MetaOptimize?

  • ptimizing the process of
slide-19
SLIDE 19

What is MetaOptimize?

  • ptimizing the process of optimizing the process
  • f
slide-20
SLIDE 20

What is MetaOptimize?

  • ptimizing the process of optimizing the process
  • f optimizing the process of optimizing the

process of optimizing the process of optimizing the process of optimizing the process of

  • ptimizing the process of optimizing the process
slide-21
SLIDE 21

What is MetaOptimize?

  • Consultancy on:
  • Large scale ML + NLP
  • Well-engineered solutions
slide-22
SLIDE 22

http://metaoptimize.com/qa/

“Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective knowledge.” - @aria42

slide-23
SLIDE 23

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-24
SLIDE 24

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-25
SLIDE 25

Opportunity with Deep Learning

  • Machine learning that’s

– Large-scale (>1B examples) – Can use all sorts of data – General purpose – Highly accurate

slide-26
SLIDE 26

Deep Learning

slide-27
SLIDE 27

Deep Learning

  • Artificial intelligence???
slide-28
SLIDE 28

Natural Intelligence

slide-29
SLIDE 29

Natural Intelligence

Works!

slide-30
SLIDE 30

Artificial Intelligence

slide-31
SLIDE 31

Artificial Intelligence

  • Still far from the

goal!

  • Why?
slide-32
SLIDE 32

Where does intelligence come from?

slide-33
SLIDE 33

Intelligence comes from knowledge

slide-34
SLIDE 34

How can a machine get knowledge?

Human input

slide-35
SLIDE 35

NO!

slide-36
SLIDE 36

Intelligence comes from knowledge. Knowledge comes from learning.

slide-37
SLIDE 37

Intelligence comes from knowledge. Knowledge comes from learning.

@turian #strataconf

slide-38
SLIDE 38

Statistical Learning

  • New multi-

disciplinary field

  • Numerous

applications

slide-39
SLIDE 39
  • Easy for machines
  • Harder for humans
  • Mathematically:

fundamentally difficult

  • Easier for humans

Generalize? Memorize?

  • r
slide-40
SLIDE 40

How do we build a learning machine?

slide-41
SLIDE 41

Deep learning architecture

… … … … …

Output: is bob? Highest-level features: Faces Abstract features: Shapes Primitive features: Edges Input: Raw pixels

slide-42
SLIDE 42

Shallow learning architecture

… … …

slide-43
SLIDE 43

Why deep architectures?

slide-44
SLIDE 44

main sub1 sub2 sub3 subsub1 subsub2 subsub3 subsubsub1 subsubsub2 subsubsub3

“Deep” computer program

slide-45
SLIDE 45

main subroutine1 includes subsub1 code and subsub2 code and subsubsub1 code

“Shallow” computer program

subroutine2 includes subsub2 code and subsub3 code and subsubsub3 code and …

slide-46
SLIDE 46

“Deep” circuit

slide-47
SLIDE 47

“Shallow” circuit

input …

2n

1 2 3 … n

  • utput
slide-48
SLIDE 48

Insufficient Depth

Insuffi sufficient ient depth th = May y re requ quire ire expo pone nenti ntial al-si size e arc rchitec hitectur ture Sufficient cient depth th = Comp

  • mpact

act re repre resenta sentation tion … 1 2 3

2n

1 2 3 … n 1 2 3 … n … … … … O(n)

slide-49
SLIDE 49

What’s wrong with a fat architecture?

slide-50
SLIDE 50

bad generalization

Overfitting!

slide-51
SLIDE 51

Occam’s Razor

slide-52
SLIDE 52

Other motivations for deep architectures?

slide-53
SLIDE 53

Learning Brains

  • 1011 neurons,

1014 synapses

  • Complex neural network
  • Learning: modify

synapses

slide-54
SLIDE 54

Visual System

slide-55
SLIDE 55

Deep Architecture in the Brain

Retina Area V1 Area V2 Area V4 pixels Edge detectors Primitive shape detectors Higher level visual abstractions

slide-56
SLIDE 56

Deep architectures are Awesome!!!

  • Because they’re compact

but…

slide-57
SLIDE 57

Why not deep architectures?

  • How do we train them?
slide-58
SLIDE 58

Before 2006

Failure of deep architectures

slide-59
SLIDE 59

Breakthrough! Mid 2006

slide-60
SLIDE 60

Signal-to-noise ratio

  • More signal!
slide-61
SLIDE 61

Deep training tricks

  • Unsupervised learning
slide-62
SLIDE 62

Deep training tricks

  • Create one layer of features at a time
slide-63
SLIDE 63

Montréal Toronto Bengio Hinton Le Cun New York

slide-64
SLIDE 64

Montréal Toronto Bengio Hinton Le Cun New York

(I did my postdoc here)

slide-65
SLIDE 65

Deep learning a success!

Since 2006 Deep learning breaks records in:

  • Handwritten character recognition
  • Component of winning NetFlix entry
  • Language modeling

Interest in deep learning:

  • NSF and DARPA
slide-66
SLIDE 66

Opportunity with Deep Learning

  • Machine learning that’s

– Large-scale (>1B examples) – Can use all sorts of data – General purpose – Highly accurate

slide-67
SLIDE 67

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-68
SLIDE 68

Opportunity with Semantic Hashing

  • Fast semantic search
slide-69
SLIDE 69

What’s wrong with keyword search?

slide-70
SLIDE 70

Keyword search

  • Search for tweets on “Hadoop”
slide-71
SLIDE 71

Keyword search

  • Search for tweets on “Hadoop”
  • Misses the following tweets:

– “Just started using HBase” – “I really like Amazon Elastic Map-Reduce”

slide-72
SLIDE 72

What’s wrong with keyword search?

slide-73
SLIDE 73

What’s wrong with keyword search?

Misses relevant results!

slide-74
SLIDE 74

Standard search: Inverted Index

slide-75
SLIDE 75

Hashing

  • Another technique for search
slide-76
SLIDE 76

Hashing

  • FAST!
slide-77
SLIDE 77

Hashing

  • Compact!
  • Without hashing:

– Billions of images => 40 TB

  • With 64-bit hashing:

– Billions of images => 8GB

slide-78
SLIDE 78

“Dumb” hashing

  • Typically no learning, not data-driven
  • Examples:

– Random Projections – Count-Min Sketch – Bloom filters – Locality Sensitive Hashing

slide-79
SLIDE 79

“Smart” Hashing

  • As fast as “dumb” hashing
  • Data-driven
  • Examples:

– Semantic Hashing (2007) – Kulis (2009) – Kumar, Wang, Chang (2010) – Etc.

slide-80
SLIDE 80

Semantic Hashing

= ??

slide-81
SLIDE 81

Semantic Hashing

= Smart hashing + deep learning

Salakhutdinov + Hinton (2007)

slide-82
SLIDE 82

Semantic Hashing architecture

slide-83
SLIDE 83

Semantic Hashing architecture

LSA/LSI, LDA TF*IDF

slide-84
SLIDE 84
slide-85
SLIDE 85
slide-86
SLIDE 86

Opportunity with Semantic Hashing

Semantic search that is:

  • General purpose
  • Fast
  • Compact
slide-87
SLIDE 87

Opportunity with Semantic Hashing

Semantic search that is:

  • General purpose

– Search text, images, videos, audio, etc.

  • Fast
  • Compact
slide-88
SLIDE 88

Opportunity with Semantic Hashing

Semantic search that is:

  • General purpose
  • Fast

– Indexing: few weeks for 1B docs, using 100 cores – Retrieval: 3.6 ms for 1 million docs, scales sublinearly

  • Compact
slide-89
SLIDE 89

Opportunity with Semantic Hashing

Semantic search that is:

  • General purpose
  • Fast
  • Compact

– 1B docs, 30-bit hashes => 4GB – 1B images, 64-bit hashes => 8GB (vs. 40 TB naïve)

slide-90
SLIDE 90

Prediction

Smart hashing will revolutionize search

slide-91
SLIDE 91

Prediction

Smart hashing will revolutionize search

@turian #strataconf

slide-92
SLIDE 92

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-93
SLIDE 93

The rise of Graph stores

  • Neo4J, HyperGraphDB, InfiniteGraph, InfoGrid,

AllegroGraph, sones, DEX, FlockDB, OrientDB, VertexDB

slide-94
SLIDE 94

Opportunity with graph-based parallelism

  • Scale sophisticated ML algorithms
  • Larger data sets
  • Higher accuracy
slide-95
SLIDE 95

Useful machine learning algorithms

  • Gibbs sampling
  • Matrix factorization
  • EM
  • Lasso
  • Etc.

Have graph-like data dependencies

slide-96
SLIDE 96

Machine learning in Map-Reduce

slide-97
SLIDE 97

Machine learning in Map-Reduce

slide-98
SLIDE 98

Machine learning in Map-Reduce

Map-Abuse

  • Carlos Guestrin
slide-99
SLIDE 99

There are too many graph-like dependencies in many ML algorithms

slide-100
SLIDE 100

Parallel abstractions for graph operations

  • Pregel (Malewicz et al, 2009, 2010)

– Erlang implementation called Phoebus

  • GraphLab (Low et al, 2010)

– Source code available

slide-101
SLIDE 101

http://metaoptimize.com/qa/

slide-102
SLIDE 102

http://metaoptimize.com/qa/questions/285/

slide-103
SLIDE 103

Opportunity with graph-based parallelism

  • Scale sophisticated ML algorithms
  • Larger data sets
  • Higher accuracy
slide-104
SLIDE 104

Prediction

Map-Reduce for simple algorithms, graph parallelism for sophisticated ML

slide-105
SLIDE 105

Prediction

Map-Reduce for simple algorithms, graph parallelism for sophisticated ML

@turian #strataconf

slide-106
SLIDE 106

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-107
SLIDE 107

Opportunity with Semantic Parsing

  • Simply reads texts and understands them
  • Applicable to general domains
  • Applications

– Question Answering (cf. Wolfram Alpha) – Natural language search (cf. Powerset) – Spam generation / Spam detection – Knowledge Extraction from Wikipedia, web, etc.

slide-108
SLIDE 108

108

Question-Answer: Example

Q: What does IL-2 control? A: ???

slide-109
SLIDE 109

109

Question-Answer: Example

Q: What does IL-2 control? A: ??? Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL- 4, in Th1 cells, while the reverse profile was seen in Th2 cells.

slide-110
SLIDE 110

110

Question-Answer: Example

Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction Sentence: Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL- 4, in Th1 cells, while the reverse profile was seen in Th2 cells.

slide-111
SLIDE 111

111

Challenge: Same Meaning, Many Variations

IL-4 induces CD11b Protein IL-4 enhances the expression of CD11b CD11b expression is induced by IL-4 protein The cytokin interleukin-4 induces CD11b expression IL-4’s up-regulation of CD11b, … ……

slide-112
SLIDE 112

Semantic Parsing

Microsoft buys Powerset.

slide-113
SLIDE 113

Semantic Parsing

Microsoft buys Powerset. BUYS(MICROSOFT,POWERSET)

slide-114
SLIDE 114

Where does intelligence come from?

Knowledge!

slide-115
SLIDE 115

115

Extracting Knowledge From Text

……

slide-116
SLIDE 116

Ontology

slide-117
SLIDE 117

How do we extract knowledge from text?

slide-118
SLIDE 118

How do we extract knowledge from text?

Hire a handful of Ph.D. linguists to write a grammar

slide-119
SLIDE 119

How do we extract knowledge from text?

slide-120
SLIDE 120

Manual approach

Costly! tly! Inef effect ective! ive! Inflex lexible! ible!

slide-121
SLIDE 121

Manual approach

  • Chal

alleng lenge: e: Same meaning can be expressed in many different ways

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, … ……

  • Manual encoding of variations?
slide-122
SLIDE 122

Manual approach

  • Chal

alleng lenge: e: Domain specific

– Grammar for newspaper articles != – Grammar for biomed articles != – Grammar for tweets – Etc.

slide-123
SLIDE 123

Kn Knowledge wledge ex extrac action tion that is is:

  • La

Large ge-scal scale, e,

  • Open

Open-dom domain ain,

  • Automati
  • matic,

c,

  • End

End-to to-en end

123

slide-124
SLIDE 124

Learn an Ontology?

Q: What does IL-2 regulate? A: The DEX-mediated IkappaBalpha induction

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

REGULATE INHIBIT

ISA

slide-125
SLIDE 125

Ontology Learning

  • St

Step ep 1: 1: Indu duction ction

  • St

Step ep 2: 2: Popu pulat lation ion

  • Limitations in existing approaches

– Require heuristic patterns or existing KBs – Pursue each task in isolation

125

slide-126
SLIDE 126

Unsupervised Semantic Parsing with Ontologies

Join intl tly y co condu ducts: cts:

  • On

Ontolo

  • logy

gy in indu duct ction, ion,

  • On

Ontol

  • logy
  • gy po

popu pula lation, tion,

  • and

d knowled wledge ge ex extrac action tion Why Why is is this is so co cool? l??

Poon + Domingos (2009, 2010)

slide-127
SLIDE 127

Intuition

  • Cluster syntactic or lexical variations of the same

meaning

BUYS(-,-)  buys, acquires, ’s purchase of, …  Cluster of various expressions for acquisition MICROSOFT  Microsoft, the Redmond software giant, …  Cluster of various mentions of Microsoft

slide-128
SLIDE 128

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

slide-129
SLIDE 129

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

slide-130
SLIDE 130

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

slide-131
SLIDE 131

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

slide-132
SLIDE 132

Microsoft buys Powerset Microsoft acquires semantic search engine Powerset Powerset is acquired by Microsoft Corporation The Redmond software giant buys Powerset Microsoft’s purchase of Powerset, …

slide-133
SLIDE 133

Clusters And Compositions

  • Clusters in core forms

 investigate, examine, evaluate, analyze, study, assay   diminish, reduce, decrease, attenuate   synthesis, production, secretion, release   dramatically, substantially, significantly  ……

  • Compositions

amino acid, t cell, immune response, transcription factor, initiation site, binding site …

slide-134
SLIDE 134

134

Experiments

  • Evaluate

luate on Question answering

  • Evaluation:

luation: Number of answers and accuracy

  • GE

GENIA NIA da dataset: set: 1999 Pubmed abstracts

  • 20

2000 00 qu ques estions ions e.g.:

  • What does anti-STAT1 inhibit?
  • What regulates MIP-1 alpha?
slide-135
SLIDE 135

100 200 300 400 500

KW-SYN TextRunner RESOLVER DIRT USP OntoUSP

Total vs. Correct Answers

slide-136
SLIDE 136

100 200 300 400 500

KW-SYN TextRunner RESOLVER DIRT USP OntoUSP

Total vs. Correct Answers

Five times as many correct answers as TextRunner Highest accuracy of 91%

slide-137
SLIDE 137

Opportunity with Semantic Parsing

  • Simply reads texts and understands them
  • Applicable to general domains
  • Applications

– Question Answering (cf. Wolfram Alpha) – Natural language search (cf. Powerset) – Spam generation / Spam detection – Knowledge Extraction from Wikipedia, web, etc.

slide-138
SLIDE 138

Prediction

Automated knowledge extraction will become widespread

slide-139
SLIDE 139

Prediction

Automated knowledge extraction will become widespread

@turian #strataconf

slide-140
SLIDE 140

Outline

  • Deep Learning

– Semantic Hashing

  • Graph parallelism
  • Unsupervised semantic parsing
slide-141
SLIDE 141

Take-home points

slide-142
SLIDE 142

When big data gives diminishing returns, you need better algorithms

@turian #strataconf

slide-143
SLIDE 143

Only use better algorithms if they will qualitatively improve your product

@turian #strataconf

slide-144
SLIDE 144

Intelligence comes from knowledge. Knowledge comes from learning.

@turian #strataconf

slide-145
SLIDE 145

Prediction

Smart hashing will revolutionize search

@turian #strataconf

slide-146
SLIDE 146

Prediction

Map-Reduce for simple algorithms, graph parallelism for sophisticated ML

@turian #strataconf

slide-147
SLIDE 147

Prediction

Automated knowledge extraction will become widespread

@turian #strataconf

slide-148
SLIDE 148

Thanks to the following people for letting me adapt their slides

  • Hoifung Poon
  • Yoshua Bengio
  • Geoff Hinton
  • Ruslan Salakhutdinov
slide-149
SLIDE 149

Questions?

Joseph Turian @turian MetaOptimize http://metaoptimize.com/qa/

slide-150
SLIDE 150
slide-151
SLIDE 151

Why not deep architectures?

  • How do we train them?
slide-152
SLIDE 152

Supervised Training Example

six two! Input X Output f(X) Target Y

slide-153
SLIDE 153

Gradient descent

… … … Input X Output f(X) six Target Y two! = ? = ?

slide-154
SLIDE 154

Gradient descent

… … … Input X Output f(X) six Target Y two! = ? = ?

slide-155
SLIDE 155

Problem on deep architectures

… … … … …

slide-156
SLIDE 156

Before 2006

Failure of deep architectures

slide-157
SLIDE 157

Breakthrough! Mid 2006

slide-158
SLIDE 158

Montréal Toronto Bengio Hinton Le Cun New York

slide-159
SLIDE 159

Montréal Toronto Bengio Hinton Le Cun New York

(I did my postdoc here)

slide-160
SLIDE 160

Signal-to-noise ratio

  • More signal!
slide-161
SLIDE 161

Deep training tricks

  • Unsupervised learning
slide-162
SLIDE 162

Deep training tricks

  • Create one layer of features at a time
slide-163
SLIDE 163

Deep training

… input

slide-164
SLIDE 164

Deep training

… … input features

slide-165
SLIDE 165

Deep training

… … … input features reconstruction

  • f input

= ? … input

slide-166
SLIDE 166

Deep training

… … input features

slide-167
SLIDE 167

Deep training

… … input features … More abstract features

slide-168
SLIDE 168

Deep training

… … input features … More abstract features reconstruction

  • f features

= ? … … …

slide-169
SLIDE 169

Deep training

… … input features … More abstract features

slide-170
SLIDE 170

Deep training

… … input features … More abstract features …

Even more abstract features

slide-171
SLIDE 171

Deep training

… … input features … More abstract features …

Even more abstract features

Output f(X) six Target Y two! = ?