Inference and Representation: Case study in Computational - - PowerPoint PPT Presentation

inference and representation
SMART_READER_LITE
LIVE PREVIEW

Inference and Representation: Case study in Computational - - PowerPoint PPT Presentation

Inference and Representation: Case study in Computational Cognitive Science Brenden Lake Learning classifiers in cognitive science concept learning classification = (cognitive science (data science & &


slide-1
SLIDE 1

Inference and Representation:

  • Case study in Computational Cognitive Science

Brenden Lake

slide-2
SLIDE 2

concept learning (cognitive science & psychology) classification (data science & machine learning)

=

labeled data for “dogs” labeled data for “cats” generalization task: dog or cat?

“Learning classifiers” in cognitive science

slide-3
SLIDE 3

human-level concept learning

the speed of learning

generating new concepts generating new examples parsing

the richness of representation

“one-shot learning”

slide-4
SLIDE 4

spring-loaded camming device drawing knife portable immersion circulator bucket-wheel excavator

slide-5
SLIDE 5

We would like to investigate a domain with…

  • 1) A relatively even slate for comparing humans and machines.
  • 2) Natural, high-dimensional concepts.
  • 3) A reasonable chance of building computational models that

can see most of the structure that people see.

  • 4) Insights that generalize across domains.

A testbed for studying human-level concept learning

slide-6
SLIDE 6

Standard machine learning

Omniglot dataset 1600+ concepts 20 examples each MNIST 10 concepts 6000 examples each

Our testbed https://github.com/brendenlake

slide-7
SLIDE 7

Sanskrit Tagalog Latin Braille Balinese Hebrew

slide-8
SLIDE 8

Angelic Alphabet of the Magi Futurama ULOG

slide-9
SLIDE 9

1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 3 4 5 6 7 1 2 1 2 1 2 1 2 3 4 5 67 8 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 1 2

Original Image 20 People’s Strokes Stroke order:

slide-10
SLIDE 10

1 2 3 4 1 2 3 1 2 3 4 1 2 3 1 2 3 4 12 3 4 1 2 1 2 1 2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 12 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 4 1 2 3 4 1 2

Original Image 20 People’s Strokes

slide-11
SLIDE 11

12 3 45 6 7 1 2 3 4 5 6 7 1 23 4 5 6 7 1 2 3 4 5 1 2 3 4 5 6 7 89 10 11 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 56 7 8 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 12 34 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 7 12 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11

Original Image 20 People’s Strokes

slide-12
SLIDE 12

human-level concept learning

the speed of learning

generating new concepts generating new examples parsing

the richness of representation

“one-shot learning”

slide-13
SLIDE 13

human-level concept learning

the richness of representation the speed of learning

generating new examples generating new concepts parsing

1 2 3

iv)

slide-14
SLIDE 14

...

relation: attached along relation: attached along relation: attached at start

v) exemplars vi) raw data iv) object template iii) parts ii) sub-parts i) primitives

A B

type level token level

Bayesian Program Learning

slide-15
SLIDE 15

human-level concept learning

the richness of representation the speed of learning

generating new examples generating new concepts parsing

1 2 3

iv)

slide-16
SLIDE 16

generating new examples

slide-17
SLIDE 17

“Draw a new example”

Which grid is produced by the model? A B A B A B A B

slide-18
SLIDE 18

“Draw a new example”

Which grid is produced by the model? A B A B A B A B

slide-19
SLIDE 19

“Draw a new example”

Which grid is produced by the model? A B A B A B A B

slide-20
SLIDE 20

“Draw a new example”

Which grid is produced by the model? A B A B A B A B

slide-21
SLIDE 21

human-level concept learning

the richness of representation the speed of learning

generating new examples generating new concepts parsing

1 2 3

iv)

slide-22
SLIDE 22

Task: “Design a new character from the same alphabet”

3 seconds remaining generating new concepts

iv)

slide-23
SLIDE 23

A B A B A B

Task: “Design a new character from the same alphabet”

Which grid is produced by the model?

slide-24
SLIDE 24

A B A B A B

Task: “Design a new character from the same alphabet”

Which grid is produced by the model?

slide-25
SLIDE 25

A B A B A B

Task: “Design a new character from the same alphabet”

Which grid is produced by the model?

slide-26
SLIDE 26

A B A B A B

Task: “Design a new character from the same alphabet”

Which grid is produced by the model?

slide-27
SLIDE 27

New machine-generated characters in each alphabet Alphabet of characters Alphabet of characters New machine-generated characters in each alphabet

Generate a new characters from the same alphabet

slide-28
SLIDE 28

...

...

connected at relation connected at relation

Bayesian Program Learning

exemplars raw data

  • bject template

parts sub-parts primitives (1D curvelets, 2D patches, 3D geons, actions, sounds, etc.)

inference

θ I

latent variables raw binary image

Bayes’ rule

renderer prior on parts, relations, etc.

P(θ|I) = P(I|θ)P(θ) P(I) Concept learning as program induction.

  • Key ingredients for learning good programs:

1) Learning-to-learn 2) Compositionality 3) Causality

slide-29
SLIDE 29

...

...

connected at relation connected at relation

Bayesian Program Learning

  • bject template

parts sub-parts primitives (1D curvelets, 2D patches, 3D geons, actions, sounds, etc.)

procedure GENERATETYPE  ← P() for i = 1 ...  do ni ← P(ni|) Si ← P(Si|ni) Ri ← P(Ri|S1, ..., Si−1) end for ← {, R, S} return @GENERATETOKEN( ) end procedure

Sample number of parts Sample number of sub-parts Sample sequence of sub-parts Sample relation Return handle to a stochastic program

C

Sample part’s ! start location

procedure GENERATETOKEN( ) for i = 1... do S(m)

i

← P(S(m)

i

|Si) L(m)

i

← P(L(m)

i

|Ri, T (m)

1

, ..., T (m)

i−1 )

T (m)

i

← f(L(m)

i

, S(m)

i

) end for A(m) ← P(A(m)) I(m) ← P(I(m)|T (m), A(m)) return I(m) end procedure

Add motor variance Compose a part’s pen trajectory Sample affine transform Render and sample the binary image

C

slide-30
SLIDE 30

Learning-to-learn programs

learned action primitives learned primitive transitions

seed primitive

1250 primitives scale selective translation invariant

slide-31
SLIDE 31

1 2 3 4 5 6 7 8 9 10 1000 2000 3000 4000 5000 Number of strokes frequency

number of strokes

1 3

≥ 3 1 2 Start position for strokes in each position

stroke start positions

Stroke

global transformations

1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1 1 2 3 4 5 6 7 8 910 0.5 1

number of sub-strokes probability Number of sub-strokes for a character with κ strokes

κ = 1 κ = 2 κ = 3 κ = 4 κ = 5 κ = 6 κ = 7 κ = 8 κ = 9 κ = 10

number of sub-strokes for a character with strokes

κ

relations between strokes

1 2 1 2 1 2 1 2

independent (34%) attached at start (5%) attached at end (11%) attached along (50%)

Learning-to-learn programs

slide-32
SLIDE 32

Inferring latent motor programs

θ I

latent variables raw binary image

...

connected relation

inference

raw data

  • bject template

strokes sub-strokes

I P(θ|I) = P(I|θ)P(θ) P(I)

Bayes’ rule such that Discrete (K=5) approximation to posterior

P(θ|I) ≈ PK

i=1 wiδ(θ − θ[i])

PK

i=1 wi

wi ∝ P(θ[i]|I)

primitives

renderer prior on programs

Intuition: Fit strokes to the observed pixels as closely as possible, with these constraints:

  • fewer strokes
  • high-probability primitive sequence
  • use relations
  • stroke order
  • stroke directions
slide-33
SLIDE 33

Inference

Binary image Thinned image

planning cleaned

Step 1: characters as undirected graphs Step 2: guided random parses

1 2 3 1 2 3 1 2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 1 2 3 4 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1 2 3 4 1 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 1 2 3 4

...

1 2 3

−531

1 2 3

−560

1 2 3

−568

1 2 3

−582

1 2 3

−588

log-probability

Step 3: Top-down fitting with gradient-based optimization more likely

a) b) c)

less likely

slide-34
SLIDE 34

One-shot classification

slide-35
SLIDE 35

−758

12

−1.88e+03

1

**−778

−19.9

1 2

Class 1

−1.88e+03

1

Class 2 Which class is image I in?

log P(I|class 1) ≈ −758

log P(I|class 2) ≈ −1880

HBPL: Computing the classification score

slide-36
SLIDE 36

Generating new concepts (unconstrained)

Alphabet Human or Machine?

Generating new concepts (from type)

Alphabet Human or Machine?

Comparing human and machine performance on five tasks

49% ID Level 51% ID Level

4.5% human error rate 3.2% machine error rate

One-shot classification (20-way)

Human or Machine?

Generating new examples

51% Identification (ID) Level

[% judges who correctly ID machine vs. human]

Human or Machine? Human or Machine?

Generating new examples (dynamic)

59% ID Level

Human

  • r Machine?
slide-37
SLIDE 37

People BPL Lesion (no compositionality) BPL BPL Lesion (no learning-to-learn)

A B

Deep Siamese Convnet Hierarchical Deep Deep Convnet Bayesian Program Learning models Deep Learning models (no causality)

A B

Classification error rate Identification (ID) Level! (% judges who correctly ID machine vs. human)

One-shot classification! (20-way)

new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

Generating! new exemplars Generating new! exemplars (dynamic) Generating new! concepts (from type) Generating new! concepts (unconstrained)

5 10 15 20 25 30 35 5 10 15 20 25 30 35 new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

Analyzing the core ingredients

chance

slide-38
SLIDE 38

A B

Classification error rate Identification (ID) Level! (% judges who correctly ID machine vs. human)

One-shot classification! (20-way)

new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

Generating! new exemplars Generating new! exemplars (dynamic) Generating new! concepts (from type) Generating new! concepts (unconstrained)

5 10 15 20 25 30 35 5 10 15 20 25 30 35 new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

People BPL Lesion (no compositionality) BPL BPL Lesion (no learning-to-learn)

A B

Deep Siamese Convnet Hierarchical Deep Deep Convnet Bayesian Program Learning models Deep Learning models (no causality)

chance

slide-39
SLIDE 39

A B

Classification error rate Identification (ID) Level! (% judges who correctly ID machine vs. human)

One-shot classification! (20-way)

new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

Generating! new exemplars Generating new! exemplars (dynamic) Generating new! concepts (from type) Generating new! concepts (unconstrained)

5 10 15 20 25 30 35 5 10 15 20 25 30 35

People BPL Lesion (no compositionality) BPL BPL Lesion (no learning-to-learn)

A B

Deep Siamese Convnet Hierarchical Deep Deep Convnet Bayesian Program Learning models Deep Learning models (no causality)

chance

slide-40
SLIDE 40

A B

Classification error rate Identification (ID) Level! (% judges who correctly ID machine vs. human)

One-shot classification! (20-way)

new exemplars new exemplars (dynamic) new concepts (from type) new concepts (unconstrained) 40 45 50 55 60 65 70 75 80 85

Generating! new exemplars Generating new! exemplars (dynamic) Generating new! concepts (from type) Generating new! concepts (unconstrained)

5 10 15 20 25 30 35 5 10 15 20 25 30 35

People BPL Lesion (no compositionality) BPL BPL Lesion (no learning-to-learn)

A B

Deep Siamese Convnet Hierarchical Deep Deep Convnet Bayesian Program Learning models Deep Learning models (no causality)

chance

slide-41
SLIDE 41

the speed of learning the richness of representation

generating ! new examples generating ! new concepts parsing

1 2 3

iv)

How can people acquire such rich concepts from

  • nly one or a few examples?

Conclusion

  • Probabilistic programs can help us understand how people

learn rich concepts from sparse data.

  • Programs can represent abstract causal processes.
  • Probability allows models to handle noise and produce

creative outputs.

  • Many challenges for future work, including developing new

inference algorithms and extending approach to other real world tasks.