lecture 18 depth estimation
play

Lecture 18: Depth estimation 1 Announcements PS9 out tonight: - PowerPoint PPT Presentation

Lecture 18: Depth estimation 1 Announcements PS9 out tonight: panorama stitching New grading policies from UMich (details TBA) Final presentation will take place over video chat. - Well send a sign-up sheet next week 2


  1. Lecture 18: Depth estimation 1

  2. Announcements • PS9 out tonight: panorama stitching • New grading policies from UMich (details TBA) • Final presentation will take place over video chat. - We’ll send a sign-up sheet next week 2

  3. Today • Stereo matching • Probabilistic graphical models • Belief propagation • Learning-based depth estimation 3

  4. Basic stereo algorithm For each epipolar line For each pixel in the left image • compare with every pixel on same epipolar line in right image • pick pixel with minimum match cost Improvement: match windows 4 Source: N. Snavely

  5. Stereo matching based on SSD SSD d min d Best matching disparity 5 Source: N. Snavely

  6. Window size W = 3 W = 20 6 Source: N. Snavely

  7. Stereo as energy minimization • What defines a good stereo correspondence? 1. Match quality Want each pixel to find a good match in the other image • 2. Smoothness If two pixels are adjacent, they should (usually) move about the • same amount 7 Source: N. Snavely

  8. Stereo as energy minimization • Find disparity map d that minimizes an energy function • Simple pixel / window matching = Squared distance between windows I ( x , y ) and J ( x + d ( x , y ), y ) 8 Source: N. Snavely

  9. Stereo as energy minimization I ( x , y ) J ( x , y ) y = 141 d x C ( x , y , d ); the disparity space image (DSI) 9 Source: N. Snavely

  10. Stereo as energy minimization y = 141 d x Simple pixel / window matching: choose the minimum of each column in the DSI independently: 10 Source: N. Snavely

  11. Greedy selection of best match 11 Source: N. Snavely

  12. Stereo as energy minimization • Better objective function { { smoothness cost match cost Want each pixel to find a good Adjacent pixels should (usually) match in the other image move about the same amount 12 Source: N. Snavely

  13. Stereo as energy minimization match cost: smoothness cost: : set of neighboring pixels 4-connected 8-connected 13 neighborhood neighborhood Source: N. Snavely

  14. Smoothness cost How do we choose V ? L 1 distance “Potts model” 14 Source: N. Snavely

  15. Probabilistic interpretation exp( E ( d )) = exp( E d ( d ) + λ E s ( d )) Exponentiate: exp( E ( d )) = 1 Normalize: Z exp( E d ( d ) + λ E s ( d )) (make it sum to 1) Z X where exp E ( d 0 ) Z = d 0 P ( d | I ) Rewrite: 15 Example adapted from Freeman, Torralba, Isola

  16. Probabilistic interpretation “Local evidence” “Pairwise compatibility” How good are the matches? Is the depth smooth? P ( d | I ) 16 Example adapted from Freeman, Torralba, Isola

  17. Probabilistic interpretation Local evidence: Pairwise compatibility: P ( d | I ) 17 Example adapted from Freeman, Torralba, Isola

  18. Probabilistic graphical models Graph structure: • Open circles for latent variables x i - d i in our problem • Filled circle for observations y i - Pixels in our problem • Edges between interacting variables - In general, graph cliques for 3+ variable interactions P ( d | I ) 18 Example adapted from Freeman, Torralba, Isola

  19. Probabilistic graphical models Why formulate it this way? • Exploit sparse graph structure for fast inference, usually using dynamic programming • Can use probabilistic inference methods • Provides framework for learning parameters P ( d | I ) 19

  20. Probabilistic graphical models Undirected graphical model. Directed graphical model Also known as Markov Random Field (MRF). Also know as Bayesian network (Not covered in this course) 20

  21. Marginalization What’s the marginal distribution for x 1 ? i.e. what’s the probability of x 1 being in a particular state? • But this is expensive: O (|L|^N) • Exploit graph structure! 21

  22. Marginalization 22

  23. Message passing Can think of “local evidence” message passing Message that node x 3 sends to node x 2 Message that x 2 sends to x 1 23

  24. Message passing • Message m ij is the sum over all states of all nodes in the subtree leaving node i at node j • It summarizes what this node “believes”. • E.g. if you have label x 2 , what’s the probability of my subgraph? • Shared computation! E.g. could reuse m 32 to help estimate p(x 2 | y). 24

  25. Belief propagation • Estimate all marginals p(x i | y) at once! [Pearl 1982] • Given a tree-structured graph, send messages in topological order 
 Sending message from j to i: 1. Multiply all incoming messages (except for the one from i) 2. Multiply the pairwise compatibility 3. Marginalize over x j 25

  26. General graphs • Vision problems often are often on grid graphs • Pretend the graph is tree-structured and do belief propagation iteratively! • Can also have consistency with N > 2 variables - But complexity is exponential in N! Loopy belief propagation: 1. Initialize all messages to 1 2. Walk through the edges in an arbitrary order (e.g. random) 3. Apply the messages updates 26

  27. Finding best labels Often want to find the labels that jointly maximize probability: argmax max x 1 ,x 2 ,x 3 This is called maximum a posteriori estimation (MAP estimation). Marginal: “Max marginal” instead: = max b ( x 1 | ~ y ) = x 2 ,x 3 max x j 27

  28. Application to stereo [Felzenzwalb & Huttenlocher, “Efficient Belief Propagation for Early Vision”, 2006] 28

  29. Deep learning + MRF refinement Query patch Positive Negative Left Right CNN-based matching + MRF refinement [Zbontar & LeCun, 2015] 29

  30. Learning to estimate depth without ground truth Learn “volume”: color + occupancy Viewpoints 3D scene [Mildenhall*, Srinivasan*, Tanick*, et al., Neural radiance fields, 2020] 30

  31. Learning to estimate depth without ground truth A good volume should reconstruct the input views 31 [Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

  32. Learning to estimate depth without ground truth 32 [Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

  33. Learning to estimate depth without ground truth Inserting virtual objects View synthesis 33 [Mildenhall*, Srinivasan*, Tanick*, et al. 2020]

  34. Next class: motion 34

Recommend


More recommend