inference and representation
play

Inference and Representation: Case study in Computational - PowerPoint PPT Presentation

Inference and Representation: Case study in Computational Cognitive Science Brenden Lake Learning classifiers in cognitive science concept learning classification = (cognitive science (data science & &


  1. Inference and Representation: � � Case study in Computational Cognitive Science Brenden Lake

  2. “Learning classifiers” in cognitive science concept learning � classification � = (cognitive science � (data science � & � & � psychology) machine learning) labeled data for � labeled data for � “dogs” “cats” generalization task: � dog or cat?

  3. human-level concept learning the richness of representation the speed of learning parsing “one-shot learning” generating new concepts generating new examples

  4. portable immersion circulator bucket-wheel excavator spring-loaded camming device drawing knife

  5. A testbed for studying human-level concept learning We would like to investigate a domain with… � � 1) A relatively even slate for comparing humans and machines. � � 2) Natural, high-dimensional concepts. � � 3) A reasonable chance of building computational models that can see most of the structure that people see. � � 4) Insights that generalize across domains.

  6. https://github.com/brendenlake Standard machine learning Our testbed MNIST � 10 concepts � 6000 examples each Omniglot dataset � 1600+ concepts � 20 examples each

  7. Sanskrit Tagalog Balinese Hebrew Latin Braille

  8. Angelic Alphabet of the Magi Futurama ULOG

  9. Original Image 20 People’s Strokes 2 2 2 1 1 1 1 2 2 3 2 1 1 1 4 1 5 2 3 2 6 7 2 1 1 2 1 1 3 67 4 2 2 5 8 2 1 1 1 1 2 2 2 2 2 1 1 2 2 3 1 1 Stroke order:

  10. Original Image 20 People’s Strokes 2 3 4 3 2 1 1 4 2 3 2 3 1 1 4 3 2 2 1 12 4 3 2 1 1 1 3 4 2 1 3 4 2 2 1 3 2 4 1 1 3 1 3 2 2 4 4 12 4 3 3 2 1 2 1 3 1 4 2 3 3 1 2 4 2 1

  11. Original Image 20 People’s Strokes 12 3 3 1 2 2 1 23 4 1 4 3 6 45 5 4 5 6 7 5 6 7 7 3 2 1 4 3 4 2 2 3 1 4 3 1 2 1 7 5 6 5 6 6 5 5 6 7 8 4 89 7 7 10 11 8 1 4 3 2 2 3 4 1 1 2 3 1 3 2 4 4 5 5 6 56 7 6 5 6 8 7 7 9 8 8 7 9 12 2 34 1 4 4 3 3 4 1 1 2 3 2 5 6 6 5 6 5 5 7 8 6 7 7 8 7 8 8 1 4 2 1 4 5 1 2 2 12 3 3 3 6 4 4 3 5 8 5 7 5 6 7 9 6 8 6 7 10 11 7

  12. human-level concept learning the speed of learning the richness of representation parsing “one-shot learning” generating new concepts generating new examples

  13. human-level concept learning the speed of learning the richness of representation parsing generating new concepts 1 iv) 2 3 generating new examples

  14. Bayesian Program Learning A B ... i) primitives ii) sub-parts iii) parts iv) object relation: relation: relation: template attached at start attached along attached along type level token level v) exemplars vi) raw data

  15. human-level concept learning the speed of learning the richness of representation parsing generating new concepts 1 iv) 2 3 generating new examples

  16. generating new examples

  17. “Draw a new example” Which grid is produced by the model? A B A B A B A B

  18. “Draw a new example” Which grid is produced by the model? A B A B A B A B

  19. “Draw a new example” Which grid is produced by the model? A B A B A B A B

  20. “Draw a new example” Which grid is produced by the model? A B A B A B A B

  21. human-level concept learning the speed of learning the richness of representation parsing generating new concepts 1 iv) 2 3 generating new examples

  22. generating new concepts iv) Task: “Design a new character from the same alphabet” 3 seconds � remaining

  23. Task: “Design a new character from the same alphabet” Which grid is produced by the model? A B A B A B

  24. Task: “Design a new character from the same alphabet” Which grid is produced by the model? A B A B A B

  25. Task: “Design a new character from the same alphabet” Which grid is produced by the model? A B A B A B

  26. Task: “Design a new character from the same alphabet” Which grid is produced by the model? A B A B A B

  27. Generate a new characters from the same alphabet Alphabet of characters New machine-generated characters in each alphabet Alphabet of characters New machine-generated characters in each alphabet

  28. Bayesian Program Learning primitives � ... (1D curvelets, 2D θ latent variables patches, 3D geons, I raw binary image actions, sounds, etc.) prior on parts, renderer relations, etc. Bayes’ rule P ( θ | I ) = P ( I | θ ) P ( θ ) sub-parts P ( I ) parts Concept learning as program induction. � relation relation Key ingredients for learning good programs: object template connected at inference 1) Learning-to-learn connected at 2) Compositionality 3) Causality ... exemplars raw data

  29. Bayesian Program Learning primitives � ... (1D curvelets, 2D patches, 3D geons, actions, sounds, etc.) procedure G ENERATE T YPE sub-parts  ← P (  ) Sample number of parts for i = 1 ...  do n i ← P ( n i |  ) Sample number of sub-parts parts S i ← P ( S i | n i ) Sample sequence of sub-parts R i ← P ( R i | S 1 , ..., S i − 1 ) Sample relation C relation relation end for connected at object template connected at procedure G ENERATE T OKEN ( ) ← {  , R, S } for i = 1 ...  do return @G ENERATE T OKEN ( ) S ( m ) ← P ( S ( m ) | S i ) Add motor variance i i Return handle to a stochastic program end procedure Sample part’s ! L ( m ) ← P ( L ( m ) | R i , T ( m ) , ..., T ( m ) ... i − 1 ) 1 start location i i T ( m ) ← f ( L ( m ) , S ( m ) C ) Compose a part’s pen i i i trajectory end for A ( m ) ← P ( A ( m ) ) Sample affine transform I ( m ) ← P ( I ( m ) | T ( m ) , A ( m ) ) return I ( m ) Render and sample the binary image end procedure

  30. Learning-to-learn programs learned action primitives learned primitive transitions seed primitive 1250 primitives scale selective translation invariant

  31. Learning-to-learn programs Start position for strokes in each position stroke start positions number of strokes 1 3 Number of strokes 5000 4000 frequency 3000 2000 1000 0 ≥ 3 1 2 1 2 3 4 5 6 7 8 9 10 Stroke number of sub-strokes for a character with strokes global transformations κ Number of sub-strokes for a character with κ strokes κ = 1 κ = 2 κ = 3 κ = 4 κ = 5 1 1 1 1 1 probability 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 κ = 6 κ = 7 κ = 8 κ = 9 κ = 10 1 1 1 1 1 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 1 2 3 4 5 6 7 8 910 number of sub-strokes relations between strokes 1 1 2 1 1 2 2 2 attached along (50%) independent (34%) attached at start (5%) attached at end (11%)

  32. Inferring latent motor programs θ latent variables ... raw binary image I primitives renderer prior on programs Bayes’ rule P ( θ | I ) = P ( I | θ ) P ( θ ) P ( I ) sub-strokes Discrete ( K =5) approximation to posterior P K i =1 w i δ ( θ − θ [ i ] ) strokes P ( θ | I ) ≈ P K i =1 w i such that relation connected object template w i ∝ P ( θ [ i ] | I ) Intuition: Fit strokes to the observed pixels inference as closely as possible, with these I raw data constraints: • fewer strokes • high-probability primitive sequence • use relations • stroke order • stroke directions

  33. Inference Step 1: characters as undirected graphs planning cleaned Thinned image Binary image Step 2: guided random parses more likely less likely 1 3 1 3 1 1 4 1 4 3 4 3 1 3 1 4 a) b) c) 3 2 2 2 3 2 2 2 2 1 2 4 3 1 1 3 4 3 4 3 1 1 2 3 4 3 4 1 5 3 1 5 4 2 2 1 2 2 2 1 2 2 ... 3 4 3 5 1 1 4 3 1 4 5 2 2 3 1 2 2 Step 3: Top-down fitting with gradient-based optimization 2 2 1 3 3 3 1 3 2 2 3 2 1 1 1 − 531 − 560 − 568 − 582 − 588 log-probability

  34. One-shot classification

  35. HBPL: Computing the classification score Class 1 Class 2 1 2 1 − 19.9 ** − 778 − 1.88e+03 0 Which class is image I in? 12 1 − 1.88e+03 − 758 log P ( I | class 2) ≈ − 1880 log P ( I | class 1) ≈ − 758

  36. Comparing human and machine performance on five tasks One-shot classification � Generating new examples Generating new concepts � (20-way) (from type) Alphabet Human or Machine? Human or Machine? 51% Identification (ID) Level � 4.5% human error rate � 3.2% machine error rate [% judges who correctly ID machine vs. human] Alphabet Generating new examples � Generating new concepts � (dynamic) (unconstrained) Human or Machine? Human or Machine? Human � Human or Machine? or Machine? 49% ID Level 59% ID Level 51% ID Level

Recommend


More recommend