cue combinations
play

Cue combinations, Bayesian models Thurs. March 1, 2018 1 Visual - PowerPoint PPT Presentation

COMP 546 Lecture 15 Cue combinations, Bayesian models Thurs. March 1, 2018 1 Visual Cues: image properties that can tell us about scene properties Image Scene texture depth gradient - size, shape, density - slant, tilt shading surface


  1. COMP 546 Lecture 15 Cue combinations, Bayesian models Thurs. March 1, 2018 1

  2. Visual Cues: image properties that can tell us about scene properties Image Scene texture depth gradient - size, shape, density - slant, tilt shading surface curvature binocular disparities depth motion (from moving observer) defocus blur 2

  3. Last lecture: Likelihood π‘ž 𝐽 = 𝑗 𝑇 = 𝑑 ) β€’ Probability of measuring image 𝐽 = 𝑗, when the scene is 𝑇 = 𝑑. (called β€œlikelihood” of scene 𝑇 = 𝑑 , given the image 𝐽 = 𝑗 ). β€’ Maximum likelihood method: Choose 𝑇 = 𝑑 that maximizes π‘ž 𝐽 = 𝑗 𝑇 = 𝑑 ) 3

  4. This lecture: How to combine cues ? π‘ž 𝐽 1 , 𝐽 2 𝑇 ) 4

  5. Example: texture only (monocular) stereo only texture and stereo [Hillis 2004] 5

  6. Assume likelihood function is β€œconditionally independent”: π‘ž 𝐽 1 , 𝐽 2 𝑇 ) = π‘ž 𝐽 1 𝑇 ) π‘ž 𝐽 2 𝑇 ) e.g. 𝐽 1 is texture. 𝐽 2 is binocular disparity. 6

  7. π‘ž 𝐽 2 𝑇 ) π‘ž 𝐽 1 𝑇 ) 𝑇 = s Assume π‘ž 𝐽 1 = 𝑗 1 𝑇 = 𝑑 ) and π‘ž 𝐽 2 = 𝑗 2 𝑇 = 𝑑 ) are Gaussian shaped. 7

  8. π‘ž 𝐽 2 𝑇 ) π‘ž 𝐽 1 𝑇 ) 𝑇 = s 𝑑 1 𝑑 2 Assume π‘ž 𝐽 1 = 𝑗 1 𝑇 = 𝑑 ) and π‘ž 𝐽 2 = 𝑗 2 𝑇 = 𝑑 ) are Gaussian shaped. Their maxima might occur at different values of 𝑑 . Why ? 8

  9. We want to find the 𝑑 that maximizes: βˆ’ 𝑑 βˆ’ 𝑑 1 2 βˆ’ 𝑑 βˆ’ 𝑑 2 2 2 𝜏 12 2 𝜏 22 π‘ž 𝐽 1 | 𝑇 = 𝑑 π‘ž 𝐽 2 | 𝑇 = 𝑑 = 𝑓 𝑓

  10. We want to find the 𝑑 that maximizes: βˆ’ 𝑑 βˆ’ 𝑑 1 2 βˆ’ 𝑑 βˆ’ 𝑑 2 2 2 𝜏 12 2 𝜏 22 π‘ž 𝐽 1 | 𝑇 = 𝑑 π‘ž 𝐽 2 | 𝑇 = 𝑑 = 𝑓 𝑓 So, we want to find the 𝑑 that minimizes:

  11. The lecture notes show that the solution 𝑇 = 𝑑 is 𝑑 = π‘₯ 1 𝑑 1 + π‘₯ 2 𝑑 2 where π‘₯ 1 + π‘₯ 2 = 1 0 < π‘₯ 𝑗 < 1 β€œLinear Cue Combination”

  12. The lecture notes show that the solution 𝑇 = 𝑑 is 𝑑 = π‘₯ 1 𝑑 1 + π‘₯ 2 𝑑 2 where π‘₯ 1 + π‘₯ 2 = 1 0 < π‘₯ 𝑗 < 1 𝜏 2 2 𝜏 1 2 π‘₯ 1 = π‘₯ 2 = 𝜏 12 + 𝜏 22 𝜏 12 + 𝜏 22 Thus, less reliable cue (larger 𝜏 ) get less weight.

  13. Example: [Hillis 2004] texture only (monocular) stereo only Measure slant discrimination thresholds for cues in isolation . Estimate likelihood function parameters ( 𝑑 1 , 𝜏 1 , 𝑑 2 , 𝜏 2 ). 13

  14. … then β€’ present cues together texture and stereo β€’ measure thresholds for 𝑇 β€’ convert thresholds to likelihood parameters ( 𝑑 , Οƒ ) 14

  15. … then β€’ present cues together texture and stereo β€’ measure thresholds for 𝑇 β€’ convert thresholds to likelihood parameters ( 𝑑 , Οƒ ) β€’ examine if these values are consistent with the model* 𝑑 = π‘₯ 1 𝑑 1 + π‘₯ 2 𝑑 2 *Model also makes prediction about Οƒ in combined case. 15

  16. π‘ž 𝐽 2 𝑇 ) π‘ž 𝐽 1 𝑇 ) texture and stereo 𝑇 = s 𝑑 1 𝑑 2 Experimenter can manipulate 𝑑 1 , 𝑑 2 , 𝜏 1 , 𝜏 2 and predict effect on perception of slant. 16

  17. COMP 546 Lecture 15 Cue combinations, Bayesian models Thurs. March 1, 2018 17

  18. π‘ž 𝐽 = 𝑗 𝑇 = 𝑑) β‰  π‘ž 𝑇 = 𝑑 𝐽 = 𝑗) Likelihood of scene 𝑑 , Probability of scene 𝑑 , given image 𝑗 given image 𝑗 What is the crucial difference ? 18

  19. wire frame with independently chosen depths regular solid cube flat drawing All scenes above have the same likelihood π‘ž( 𝐽 = 𝑗 | 𝑇 = 𝑑 ). Why do we prefer the regular solid cube? [Kersten & Yuille 2003]

  20. Some scenes may have a larger probability π‘ž(𝑇 = 𝑑 ). The marginal probably π‘ž(𝑇 = 𝑑 ) is called the "prior".

  21. π‘ž(𝐽, 𝑇 ) π‘ž 𝐽 𝑇 ) ≑ π‘ž(𝑇) π‘ž (𝐽, 𝑇 ) π‘ž 𝑇 𝐽 ) ≑ π‘ž(𝐽) Thus, π‘ž 𝐽 𝑇 ) π‘ž 𝑇 = π‘ž 𝑇 𝐽 ) π‘ž 𝐽

  22. Bayes Theorem π‘ž(𝐽, 𝑇 ) π‘ž 𝐽 𝑇 ) ≑ π‘ž(𝑇) π‘ž (𝐽, 𝑇 ) π‘ž 𝑇 𝐽 ) ≑ π‘ž(𝐽) Thus, likelihood scene prior π‘ž 𝐽 𝑇 ) π‘ž 𝑇 π‘ž 𝑇 𝐽 ) = π‘ž 𝐽 image prior posterior

  23. Maximum β€˜ a Posteriori’ (MAP) Given an image, 𝐽 = 𝑗, find the scene 𝑇 = 𝑑 that maximizes π‘ž( 𝑇 = 𝑑 | 𝐽 = 𝑗 ). likelihood scene prior π‘ž 𝐽 𝑇 ) π‘ž 𝑇 π‘ž 𝑇 𝐽 ) = π‘ž 𝐽 image prior posterior

  24. Maximum β€˜ a Posteriori’ (MAP) Given an image, 𝐽 = 𝑗, find the scene 𝑇 = 𝑑 that maximizes π‘ž( 𝑇 = 𝑑 | 𝐽 = 𝑗 ). We don't care about π‘ž( 𝐽 = 𝑗 ). Why not ? likelihood scene prior π‘ž 𝐽 𝑇 ) π‘ž 𝑇 π‘ž 𝑇 𝐽 ) = π‘ž 𝐽 image prior posterior

  25. If the prior p(S) is uniform then maximum likelihood gives the same solution as maximum posterior (MAP). likelihood scene prior constant π‘ž 𝐽 𝑇 ) π‘ž 𝑇 π‘ž 𝑇 𝐽 ) = π‘ž 𝐽 image prior posterior Interesting cases arise when the prior is non-uniform.

  26. likelihood prior

  27. Ames Room http://www.youtube.com/watch?v=Ttd0YjXF0no https://www.youtube.com/watch?v=gJhyu6nlGt8

  28. Priors (β€œNatural Scenes Statistics”) β€’ intensity β€’ orientation of image lines, edges β€’ disparity β€’ motion β€’ surface slant, tilt

  29. orientation πœ„ of lines, edges π‘ž(𝑇 = πœ„) [Girshick 2011] People are indeed better at discriminating vertical and horizontal orientations than oblique orientations. Why? Because they use a prior ?

  30. surface slant 𝜏 and tilt 𝜐 ceiling floor Here we represent (slant, tilt) using a concave hemisphere. See next slide.

  31. π‘ž(𝑇 = (𝜏, 𝜐)) Each disk shows π‘ž(𝜏, 𝜐) for surfaces represent slants and tilts using a concave visible over a range of viewing direction elevations, relative to line of sight. [Adams & Elder 2016]

  32. π‘ž(𝑇 = (𝜏, 𝜐))

  33. π‘ž(𝑇 = (𝜏, 𝜐))

  34. Maximum a Posteriori (MAP) Choose the S = (slant,tilt) that maximizes the posterior. βˆ— π‘ž( 𝑇 ) = π‘ž(𝐽 = 𝑗 | 𝑇 ) π‘ž 𝑇 𝐽 = 𝑗 ) posterior likelihood prior

  35. Likelihood functions can have more than one maximum. overall (slant, tilt) π‘ž(𝐽 = 𝑗 | 𝑇 ) i.e. convex or concave ?

  36. Depth Reversal Ambiguity and Shading (see Exercise) Likelihood (slant, tilt) π‘ž(𝐽 = 𝑗 | 𝑇 ) A valley illuminated from the right produces the same shading as a hill illuminated from the left.

  37. What β€œpriors” does the visual system use to resolve such twofold ambiguities ? Let’s look at a few related examples.

  38. You can perceive the center point as a hill or a valley. When you see it as a hill, you perceive the tilt as 180 deg (leftward). But when you see it as a valley, the slant is 0 (rightward).

  39. We tend to see the center as a hill. Why ?

  40. We tend to see the center as a valley. Why ?

  41. The visual system uses three priors to resolve the depth reversal ambiguity: - surface orientation: p(floor) > p(ceiling) - light source direction: p( above) > p( below) - β€˜global’ surface curvature: p(convex) > p(concave)

  42. Example in which all three priors assumptions are met light from above viewpoint from above (floor) shape is convex

  43. Example in which all three prior assumptions fail shape is concave viewpoint from below (ceiling) light from below

  44. Convex shape, illuminated from above the line of sight floor ceiling

  45. Concave shape, illuminated from below the line of sight ceiling floor

  46. We showed how people combined the three different "priors": Percent correct in judging local "hill" or "valley": = 50 +/- 10 floor vs. ceiling +/- 10 light from above vs. below +/- 10 globally convex/concave [Langer and Buelthoff, 2001]

  47. Best Worst (80%) (20%)

  48. These look weird, but in different ways. How ?

  49. Reminder β€’ A2 is due tonight β€’ Midterm (optional) is first class after Study Break

Recommend


More recommend