COMP 546 Lecture 15 Cue combinations, Bayesian models Thurs. March 1, 2018 1
Visual Cues: image properties that can tell us about scene properties Image Scene texture depth gradient - size, shape, density - slant, tilt shading surface curvature binocular disparities depth motion (from moving observer) defocus blur 2
Last lecture: Likelihood π π½ = π π = π‘ ) β’ Probability of measuring image π½ = π, when the scene is π = π‘. (called βlikelihoodβ of scene π = π‘ , given the image π½ = π ). β’ Maximum likelihood method: Choose π = π‘ that maximizes π π½ = π π = π‘ ) 3
This lecture: How to combine cues ? π π½ 1 , π½ 2 π ) 4
Example: texture only (monocular) stereo only texture and stereo [Hillis 2004] 5
Assume likelihood function is βconditionally independentβ: π π½ 1 , π½ 2 π ) = π π½ 1 π ) π π½ 2 π ) e.g. π½ 1 is texture. π½ 2 is binocular disparity. 6
π π½ 2 π ) π π½ 1 π ) π = s Assume π π½ 1 = π 1 π = π‘ ) and π π½ 2 = π 2 π = π‘ ) are Gaussian shaped. 7
π π½ 2 π ) π π½ 1 π ) π = s π‘ 1 π‘ 2 Assume π π½ 1 = π 1 π = π‘ ) and π π½ 2 = π 2 π = π‘ ) are Gaussian shaped. Their maxima might occur at different values of π‘ . Why ? 8
We want to find the π‘ that maximizes: β π‘ β π‘ 1 2 β π‘ β π‘ 2 2 2 π 12 2 π 22 π π½ 1 | π = π‘ π π½ 2 | π = π‘ = π π
We want to find the π‘ that maximizes: β π‘ β π‘ 1 2 β π‘ β π‘ 2 2 2 π 12 2 π 22 π π½ 1 | π = π‘ π π½ 2 | π = π‘ = π π So, we want to find the π‘ that minimizes:
The lecture notes show that the solution π = π‘ is π‘ = π₯ 1 π‘ 1 + π₯ 2 π‘ 2 where π₯ 1 + π₯ 2 = 1 0 < π₯ π < 1 βLinear Cue Combinationβ
The lecture notes show that the solution π = π‘ is π‘ = π₯ 1 π‘ 1 + π₯ 2 π‘ 2 where π₯ 1 + π₯ 2 = 1 0 < π₯ π < 1 π 2 2 π 1 2 π₯ 1 = π₯ 2 = π 12 + π 22 π 12 + π 22 Thus, less reliable cue (larger π ) get less weight.
Example: [Hillis 2004] texture only (monocular) stereo only Measure slant discrimination thresholds for cues in isolation . Estimate likelihood function parameters ( π‘ 1 , π 1 , π‘ 2 , π 2 ). 13
β¦ then β’ present cues together texture and stereo β’ measure thresholds for π β’ convert thresholds to likelihood parameters ( π‘ , Ο ) 14
β¦ then β’ present cues together texture and stereo β’ measure thresholds for π β’ convert thresholds to likelihood parameters ( π‘ , Ο ) β’ examine if these values are consistent with the model* π‘ = π₯ 1 π‘ 1 + π₯ 2 π‘ 2 *Model also makes prediction about Ο in combined case. 15
π π½ 2 π ) π π½ 1 π ) texture and stereo π = s π‘ 1 π‘ 2 Experimenter can manipulate π‘ 1 , π‘ 2 , π 1 , π 2 and predict effect on perception of slant. 16
COMP 546 Lecture 15 Cue combinations, Bayesian models Thurs. March 1, 2018 17
π π½ = π π = π‘) β π π = π‘ π½ = π) Likelihood of scene π‘ , Probability of scene π‘ , given image π given image π What is the crucial difference ? 18
wire frame with independently chosen depths regular solid cube flat drawing All scenes above have the same likelihood π( π½ = π | π = π‘ ). Why do we prefer the regular solid cube? [Kersten & Yuille 2003]
Some scenes may have a larger probability π(π = π‘ ). The marginal probably π(π = π‘ ) is called the "prior".
π(π½, π ) π π½ π ) β‘ π(π) π (π½, π ) π π π½ ) β‘ π(π½) Thus, π π½ π ) π π = π π π½ ) π π½
Bayes Theorem π(π½, π ) π π½ π ) β‘ π(π) π (π½, π ) π π π½ ) β‘ π(π½) Thus, likelihood scene prior π π½ π ) π π π π π½ ) = π π½ image prior posterior
Maximum β a Posterioriβ (MAP) Given an image, π½ = π, find the scene π = π‘ that maximizes π( π = π‘ | π½ = π ). likelihood scene prior π π½ π ) π π π π π½ ) = π π½ image prior posterior
Maximum β a Posterioriβ (MAP) Given an image, π½ = π, find the scene π = π‘ that maximizes π( π = π‘ | π½ = π ). We don't care about π( π½ = π ). Why not ? likelihood scene prior π π½ π ) π π π π π½ ) = π π½ image prior posterior
If the prior p(S) is uniform then maximum likelihood gives the same solution as maximum posterior (MAP). likelihood scene prior constant π π½ π ) π π π π π½ ) = π π½ image prior posterior Interesting cases arise when the prior is non-uniform.
likelihood prior
Ames Room http://www.youtube.com/watch?v=Ttd0YjXF0no https://www.youtube.com/watch?v=gJhyu6nlGt8
Priors (βNatural Scenes Statisticsβ) β’ intensity β’ orientation of image lines, edges β’ disparity β’ motion β’ surface slant, tilt
orientation π of lines, edges π(π = π) [Girshick 2011] People are indeed better at discriminating vertical and horizontal orientations than oblique orientations. Why? Because they use a prior ?
surface slant π and tilt π ceiling floor Here we represent (slant, tilt) using a concave hemisphere. See next slide.
π(π = (π, π)) Each disk shows π(π, π) for surfaces represent slants and tilts using a concave visible over a range of viewing direction elevations, relative to line of sight. [Adams & Elder 2016]
π(π = (π, π))
π(π = (π, π))
Maximum a Posteriori (MAP) Choose the S = (slant,tilt) that maximizes the posterior. β π( π ) = π(π½ = π | π ) π π π½ = π ) posterior likelihood prior
Likelihood functions can have more than one maximum. overall (slant, tilt) π(π½ = π | π ) i.e. convex or concave ?
Depth Reversal Ambiguity and Shading (see Exercise) Likelihood (slant, tilt) π(π½ = π | π ) A valley illuminated from the right produces the same shading as a hill illuminated from the left.
What βpriorsβ does the visual system use to resolve such twofold ambiguities ? Letβs look at a few related examples.
You can perceive the center point as a hill or a valley. When you see it as a hill, you perceive the tilt as 180 deg (leftward). But when you see it as a valley, the slant is 0 (rightward).
We tend to see the center as a hill. Why ?
We tend to see the center as a valley. Why ?
The visual system uses three priors to resolve the depth reversal ambiguity: - surface orientation: p(floor) > p(ceiling) - light source direction: p( above) > p( below) - βglobalβ surface curvature: p(convex) > p(concave)
Example in which all three priors assumptions are met light from above viewpoint from above (floor) shape is convex
Example in which all three prior assumptions fail shape is concave viewpoint from below (ceiling) light from below
Convex shape, illuminated from above the line of sight floor ceiling
Concave shape, illuminated from below the line of sight ceiling floor
We showed how people combined the three different "priors": Percent correct in judging local "hill" or "valley": = 50 +/- 10 floor vs. ceiling +/- 10 light from above vs. below +/- 10 globally convex/concave [Langer and Buelthoff, 2001]
Best Worst (80%) (20%)
These look weird, but in different ways. How ?
Reminder β’ A2 is due tonight β’ Midterm (optional) is first class after Study Break
Recommend
More recommend