cs4495 computer vision introduction to recognition
play

CS4495 Computer Vision Introduction to Recognition Aaron Bobick - PowerPoint PPT Presentation

CS4495 Computer Vision Introduction to Recognition Aaron Bobick School of Interactive Computing What does recognition involve? Source: Fei Fei Li, Rob Fergus, Antonio Torralba. Verification: is that a lamp? Detection: are there people?


  1. CS4495 Computer Vision Introduction to Recognition Aaron Bobick School of Interactive Computing

  2. What does recognition involve? Source: Fei ‐ Fei Li, Rob Fergus, Antonio Torralba.

  3. Verification: is that a lamp?

  4. Detection: are there people?

  5. Identification: is that Potala Palace?

  6. Object categorization mountain tree building banner street lamp vendor people

  7. Scene and context categorization • outdoor • city • …

  8. Instance ‐ level recognition problem John’s car

  9. Generic categorization problem

  10. Object Categorization Task: Given a (small) number of training images of a category, recognize a ‐ priori unknown instances of that category and assign the correct category label. K. Grauman, B. Leibe

  11. Object Categorization “Fido” German dog animal living shepherd being Which categories are the best for visual identification?

  12. Visual Object Categories Basic Level Categories in human categorization [Rosch 76, Lakoff 87] • The highest level at which category members have similar perceived shape • The highest level at which a single mental image reflects the entire category

  13. Visual Object Categories Basic Level Categories in human categorization [Rosch 76, Lakoff 87] • The level at which human subjects are usually fastest at identifying category members • The first level named and understood by children • The highest level at which a person uses similar motor actions for interaction with category members

  14. Object Categorization “Fido” German dog animal living shepherd being Which categories are the best for visual identification?

  15. How many object categories are there? Biederman 1987

  16. Other Types of Categories Functional Categories e.g. chairs = “something you can sit on” K. Grauman, B. Leibe

  17. Other Types of Categories Ad ‐ hoc categories e.g. “something you can find in an office environment” K. Grauman, B. Leibe

  18. Words: Why recognition? • Recognition a fundamental part of perception • e.g., robots, autonomous agents • Organize and give access to visual content • Connect to information • Detect trends and themes • Because it is a very human way of thinking about things…

  19. Autonomous agents able to detect objects http://www.darpa.mil/grandchallenge/gallery.asp

  20. Labeling people

  21. Posing visual queries Belhumeur et al.

  22. Finding visually similar objects

  23. So why is this hard?

  24. Challenges: Robustness Clutter Object pose Illumination Kristen Grauman

  25. Challenges: Robustness Occlusions Intra ‐ class Viewpoint appearance

  26. Challenges: Robustness Realistic scenes are crowded, cluttered, have overlapping objects.

  27. Challenges: Importance of context Fei ‐ Fei, Fergus & Torralba

  28. Challenges: Importance of context Fei ‐ Fei, Fergus & Torralba

  29. Challenges: complexity • Thousands to millions of pixels in an image • 3,000 ‐ 30,000 human recognizable object categories • 30+ degrees of freedom in the pose of articulated objects (humans) Kristen Grauman

  30. Challenges: complexity • Billions of images indexed by Google Image Search • In 2011, 6 billion photos uploaded per month • Approx one billion million camera phones sold in 2013 • About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991] Kristen Grauman

  31. So what works?

  32. What worked most reliably “yesterday” • Reading license plates (real easy), zip codes, checks Lana Lazebnik

  33. What worked most reliably “yesterday” • Reading license plates, zip codes, checks • Fingerprint recognition Lana Lazebnik

  34. What worked most reliably “yesterday” • Reading license plates, zip codes, checks • Fingerprint recognition • Face detection (Today recognition)

  35. What worked most reliably “yesterday” • Reading license plates, zip codes, checks • Fingerprint recognition • Face detection (Today recognition) • Recognition of flat textured objects (CD covers, book covers, etc.) Lana Lazebnik

  36. Just in: GoogleNet 2014

  37. Just in: GoogleNet – no context needed?

  38. Supervised classification Given a collection of labeled examples, come up with a function that will predict the labels of new examples. “four” Training examples “nine” Novel input ? Kristen Grauman

  39. Supervised classification How good is the function we come up with to do the classification? (What does “good” mean?) Depends on: • What mistakes does it make • Cost associated with the mistakes Kristen Grauman

  40. Supervised classification Since we know the desired labels of training data, we want to minimize the expected misclassification

  41. Supervised classification Two general strategies • Use the training data to build representative probability model; separately model class ‐ conditional densities and priors ( Generative ) • Directly construct a good decision boundary, model the posterior ( Discriminative )

  42. Supervised classification: Generative Given labeled training examples, predict labels for new examples • Notation: ) ‐ object is a ‘4’ but you call it a ‘9’ • We’ll assume the cost of is zero. Kristen Grauman

  43. Supervised classification: Generative Consider the two ‐ class (binary) decision problem: : Loss of classifying a 4 as a 9 • : Loss of classifying a 9 as a 4 • Kristen Grauman

  44. Supervised classification: Generative Risk of a classifier strategy S is expected loss:        (S) Pr 4 9| using S 4 9 R L        Pr 9 4| using S 9 4 L We want to choose a classifier so as to minimize this total risk Kristen Grauman

  45. Supervised classification: minimal risk At best decision boundary, either choice of label yields same expected loss. Feature value � If we choose class “four” at boundary, expected loss is:     (class is 9| ) (9 4) (class is 4 | ) (4 4) x x P L P L   (class is 9| ) ( 9 4) x P L If we choose class “nine” at boundary, expected loss is:   (class is 4| ) ( 4 9 ) x P L Kristen Grauman

  46. Supervised classification: minimal risk At best decision boundary, either choice of label yields same expected loss. Feature value � So, best decision boundary is at point x where:    (class is 9| ) (9 4) P(class is 4| ) (4 9 ) x x P L L To classify a new point, choose class with lowest expected loss; i.e., choose    “four” if: (4 | ) (4 9 ) (9 | ) (9 4) x x P L P L Kristen Grauman

  47. Supervised classification: minimal risk P(4 | x ) L(4 → 9) At best decision boundary, P(9 | x ) L(9 → 4) either choice of label yields same expected loss. Feature value � So, best decision boundary is at point x where:    P(class is 9| ) (9 4 ) P(class is 4| ) (4 9 ) x x L L How to evaluate these probabilities? Kristen Grauman

  48. Example: learning skin colors Percentage of skin pixels in each bin P(x| skin) Feature x = Hue P( x | not skin ) Kristen Grauman

  49. Example: learning skin colors Percentage of skin pixels in each bin P(x| skin) Feature x = Hue P( x | not skin ) Now we get a new image, and want to label each pixel as skin or non ‐ skin. Kristen Grauman

  50. Bayes rule P(x| skin) prior likelihood posterior Where does the prior come from?

  51. Bayes rule in (ab)use Likelihood ratio test (assuming cost of errors is the same): If > classify x as skin … so …. If classify as skin (Bayes rule) (if the costs are different just re ‐ weight)

  52. Bayes rule in (ab)use … but I don’t really know prior … … but I can assume it some constant Ω … … so with some training data I can estimate Ω … …. and with the same training data I can measure the likelihood densities of both and So…. I can more or less come up with a rule… Steve Seitz

  53. Example: classifying skin pixels Now for every pixel in a new image, we can estimate probability that it is generated by skin: If classify as skin; otherwise not Brighter pixels are higher probability of being skin Kristen Grauman

  54. Example: classifying skin pixels Gary Bradski, 1998

  55. Example: classifying skin pixels Gary Bradski, 1998

  56. More general generative models For a given measurement and set of classes � choose ∗ by:   * argmax ( | ) argmax ( ) ( | ) x x c p c p c p c c c

  57. Continuous generative models • If x is continuous, need likelihood density model of p( x| c) • Typically parametric – Gaussian or mixture of Gaussians Gaussian Mixture of Gaussians

  58. Continuous generative models • Why not just some histogram or some KNN (Parzen window) method? • You might… • But you would need lots and lots of data everywhere you might get a point • The whole point of modeling with a parameterized model is not to need lots of data.

  59. Summary of generative models: + Firm probabilistic grounding + Allows inclusion of prior knowledge + Parametric modeling of likelihood permits using small number of examples + New classes do not perturb previous models + Others: Can take advantage of unlabelled data Can be used to generate samples

Recommend


More recommend