class logistics
play

Class logistics Tonight midnight, the take-home exam is due. Next - PowerPoint PPT Presentation

Class logistics Tonight midnight, the take-home exam is due. Next week: spring break Following week, on Thursday, your project proposals are due. Feel free to ask Xiaoxu or me for feedback or ideas regarding the project.


  1. Belief propagation messages A message: can be thought of as a set of weights on each of your possible states To send a message: Multiply together all the incoming messages, except from the node you’re sending to, then multiply by the compatibility matrix and marginalize over the sender’s states. ∑ ∏ = ψ j k ( ) ( , ) ( ) M x x x M x ij i i i j j j ∈ ( ) \ x k N j i j i j j i =

  2. Beliefs To find a node’s beliefs: Multiply together all the messages coming in to that node. ∏ = k ( ) ( ) b x M x j j j j j ∈ ( ) k N j

  3. Belief, and message updates ∏ = k ( ) ( ) b x M x j j j j j ∈ ( ) k N j ∑ ∏ = ψ j k ( ) ( , ) ( ) M x x x M x ij i i i j j j ∈ ( ) \ x k N j i j i j i =

  4. Optimal solution in a chain or tree: Belief Propagation • “Do the right thing” Bayesian algorithm. • For Gaussian random variables over time: Kalman filter. • For hidden Markov models: forward/backward algorithm (and MAP variant is Viterbi).

  5. No factorization with loops! = Φ mean ( , ) x x y 1 1 1 MMSE x 1 Φ Ψ sum ( , ) ( , ) x y x x 2 2 1 2 x 2 Φ Ψ Ψ sum ( , ) ( , ) x y x x ( , ) x x 3 3 2 3 1 3 x 3 y 2 y 3 y 1 x 2 x 3 x 1

  6. Justification for running belief propagation in networks with loops • Experimental results: – Error-correcting codes Kschischang and Frey, 1998; McEliece et al., 1998 Freeman and Pasztor, 1999; – Vision applications Frey, 2000 • Theoretical results: – For Gaussian processes, means are correct. Weiss and Freeman, 1999 – Large neighborhood local maximum for MAP. Weiss and Freeman, 2000 – Equivalent to Bethe approx. in statistical physics. Yedidia, Freeman, and Weiss, 2000 – Tree-weighted reparameterization Wainwright, Willsky, Jaakkola, 2001

  7. Statistical mechanics interpretation U - TS = Free energy ∑ U = avg. energy = ( , ,...) ( , ,...) p x x E x x 1 2 1 2 states T = temperature ∑ − ( , ,...) ln ( , ,...) p x x p x x S = entropy = 1 2 1 2 states

  8. Free energy formulation Defining − − Ψ = ( , ) / Φ = E x x T ( ) / E x T ( , ) ( ) i j x x e x e i ij i j i i ( , ,...) P x 1 x then the probability distribution 2 that minimizes the F.E. is precisely the true probability of the Markov network, ∏ ∏ = Ψ Φ ( , ,...) ( , ) ( ) P x x x x x 1 2 ij i j i i ij i

  9. Approximating the Free Energy [ ( , ,..., )] F p x x x Exact : 1 2 N [ ( )] F b i x Mean Field Theory : i [ ( ), ( , )] F b x b x x Bethe Approximation : i i ij i j Kikuchi Approximations : [ ( ), ( , ), ( , ),....] F b x b x x b x x x , i i ij i j ijk i j k

  10. Mean field approximation to free energy U - TS = Free energy ∑∑ ∑∑ = + ( ) ( ) ( ) ( , ) ( ) ln ( ) F b b x b x E x x b x T b x MeanField i i i j j ij i j i i i i ( ) , ij x x i x i j i The variational free energy is, up to an additive constant, equal to the Kllback-Leibler divergence between b(x) and the true probability, P(x). ∏ KL divergence: ( ) b x i ∑ ∏ = ( || ) ( ) ln i D b P b x KL i ( ) P x , 2 ,... x x i 1

  11. Setting deriv w.r.t b i =0 U - TS = Free energy Corresponds to eq. 18 in Jordan and Weiss ms. ∑∑ = α − ( ) exp( ( ) ( , ) / ) b x b x E x x T i i j j ij i j ( ) ij x j In words: “Set the probability of each state x i at node i to be proportional to e to the minus expected energy corresponding to each state x i , given the expected values of all the neighboring states.”

  12. Bethe Approximation On tree-like lattices, exact formula: ∏ ∏ − = 1 q ( , ,..., ) ( , ) [ ( )] p x x x p x x p x i 1 2 N ij i j i i ( ) ij i ∑∑ = + ( , ) ( , )( ( , ) ln ( , )) F b b b x x E x x T b x x Bethe i ij ij i j ij i j ij i j ( ) , ij x x i j ∑ ∑ + − + ( 1 ) ( )( ( ) ln ( )) q b x E x T b x i i i i i i i i x i

  13. Gibbs Free Energy ∑ ∑ + γ − ( , ) { ( , ) 1 } F b b b x x Bethe i ij ij ij i j ( ) , ij x x i j ∑ ∑ ∑ + λ − ( ){ ( , ) ( )} x b x x b x ij j ij i j j j ( ) x ij x j i

  14. Gibbs Free Energy ∑ ∑ + γ − ( , ) { ( , ) 1 } F b b b x x Bethe i ij ij ij i j ( ) , ij x x i j ∑ ∑ ∑ + λ − ( ){ ( , ) ( )} x b x x b x ij j ij i j j j ( ) x ij x j i Set derivative of Gibbs Free Energy w.r.t. b ij , b i terms to zero: − λ ( ) x = Ψ ij i ( , ) ( , ) exp( ) b x x k x x ij i j ij i j T λ ∑ ( ) x ij i ∈ = Φ ( ) j N i ( ) ( ) exp( ) b x k x i i i T

  15. Belief Propagation = Bethe λ ( ) Lagrange multipliers ij x j ∑ = enforce the constraints ( ) ( , ) b x b x x j j ij i j x i Bethe stationary conditions = message update rules ∏ λ = k ( ) ln ( ) x T M x with ij j j j ∈ ( ) \ k N j i

  16. Region marginal probabilities ∏ = Φ k ( ) ( ) ( ) b x k x M x i i i i i ∈ ( ) k N i i ∏ ∏ = Ψ k k ( , ) ( , ) ( ) ( ) b x x k x x M x M x ij i j i j i i j j ∈ ∈ ( ) \ ( ) \ k N i j k N j i i j

  17. Belief propagation equations Belief propagation equations come from the marginalization constraints. i i j i j i = ∑ ∏ = ψ j k ( ) ( , ) ( ) M x x x M x ij i i i j j j ∈ ( ) \ x k N j i j

  18. Results from Bethe free energy analysis • Fixed point of belief propagation equations iff. Bethe approximation stationary point. • Belief propagation always has a fixed point. • Connection with variational methods for inference: both minimize approximations to Free Energy, – variational : usually use primal variables. – belief propagation : fixed pt. equs. for dual variables. • Kikuchi approximations lead to more accurate belief propagation algorithms. • Other Bethe free energy minimization algorithms— Yuille, Welling, etc.

  19. Kikuchi message-update rules Groups of nodes send messages to other groups of nodes. Typical choice for Kikuchi cluster. i j i j i j = i = k l Update for Update for messages messages

  20. Generalized belief propagation Marginal probabilities for nodes in one row of a 10x10 spin glass

  21. References on BP and GBP • J. Pearl, 1985 – classic • Y. Weiss, NIPS 1998 – Inspires application of BP to vision • W. Freeman et al learning low-level vision, IJCV 1999 – Applications in super-resolution, motion, shading/paint discrimination • H. Shum et al, ECCV 2002 – Application to stereo • M. Wainwright, T. Jaakkola, A. Willsky – Reparameterization version • J. Yedidia, AAAI 2000 – The clearest place to read about BP and GBP.

  22. Graph cuts • Algorithm: uses node label swaps or expansions as moves in the algorithm to reduce the energy. Swaps many labels at once, not just one at a time, as with ICM. • Find which pixel labels to swap using min cut/max flow algorithms from network theory. • Can offer bounds on optimality. • See Boykov, Veksler, Zabih, IEEE PAMI 23 (11) Nov. 2001 (available on web).

  23. Comparison of graph cuts and belief propagation Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters, ICCV 2003. Marshall F. Tappen William T. Freeman

  24. Ground truth, graph cuts, and belief propagation disparity solution energies

  25. Graph cuts versus belief propagation • Graph cuts consistently gave slightly lower energy solutions for that stereo-problem MRF, although BP ran faster, although there is now a faster graph cuts implementation than what we used… • However, here’s why I still use Belief Propagation: – Works for any compatibility functions, not a restricted set like graph cuts. – I find it very intuitive. – Extensions: sum-product algorithm computes MMSE, and Generalized Belief Propagation gives you very accurate solutions, at a cost of time.

  26. MAP versus MMSE

  27. Show program comparing some methods on a simple MRF testMRF.m

  28. Outline of MRF section • Inference in MRF’s. – Gibbs sampling, simulated annealing – Iterated condtional modes (ICM) – Variational methods – Belief propagation – Graph cuts • Vision applications of inference in MRF’s. • Learning MRF parameters. – Iterative proportional fitting (IPF)

  29. Vision applications of MRF’s • Stereo • Motion estimation • Super-resolution • Many others…

  30. Vision applications of MRF’s • Stereo • Motion estimation • Super-resolution • Many others…

  31. Motion application image patches image scene patches scene

  32. What behavior should we see in a motion algorithm? • Aperture problem • Resolution through propagation of information • Figure/ground discrimination

  33. The aperture problem

  34. The aperture problem

  35. Program demo

  36. Motion analysis: related work • Markov network – Luettgen, Karl, Willsky and collaborators. • Neural network or learning-based – Nowlan & T. J. Senjowski; Sereno. • Optical flow analysis – Weiss & Adelson; Darrell & Pentland; Ju, Black & Jepson; Simoncelli; Grzywacz & Yuille; Hildreth; Horn & Schunk; etc.

  37. Motion estimation results Inference: (maxima of scene probability distributions displayed) Image data Iterations 0 and 1 Initial guesses only show motion at edges.

  38. Motion estimation results (maxima of scene probability distributions displayed) Iterations 2 and 3 Figure/ground still unresolved here.

  39. Motion estimation results (maxima of scene probability distributions displayed) Iterations 4 and 5 Final result compares well with vector quantized true (uniform) velocities.

  40. Vision applications of MRF’s • Stereo • Motion estimation • Super-resolution • Many others…

  41. Super-resolution • Image: low resolution image • Scene: high resolution image ultimate goal... image scene

  42. Pixel-based images are not resolution independent Pixel replication Cubic spline, Cubic spline sharpened Training-based super-resolution Polygon-based graphics images are resolution independent

  43. 3 approaches to perceptual sharpening amplitude (1) Sharpening; boost existing high frequencies. spatial frequency (2) Use multiple frames to obtain higher sampling rate in a still frame. (3) Estimate high frequencies not present in image, although implicitly defined. amplitude In this talk, we focus on (3), which spatial frequency we’ll call “super-resolution”.

  44. Super-resolution: other approaches • Schultz and Stevenson, 1994 • Pentland and Horowitz, 1993 • fractal image compression (Polvere, 1998; Iterated Systems) • astronomical image processing (eg. Gull and Daniell, 1978; “pixons” http://casswww.ucsd.edu/puetter.html)

  45. Training images, ~100,000 image/scene patch pairs Images from two Corel database categories: “giraffes” and “urban skyline”.

  46. Do a first interpolation Zoomed low-resolution Low-resolution

  47. Zoomed low-resolution Full frequency original Low-resolution

  48. Full freq. original Representation Zoomed low-freq.

  49. Representation Zoomed low-freq. Full freq. original True high freqs Low-band input (contrast normalized, (to minimize the complexity of the relationships we have to learn, PCA fitted) we remove the lowest frequencies from the input image, and normalize the local contrast level).

  50. ... high freqs. low freqs. Gather ~100,000 patches Training data samples (magnified) ...

  51. Nearest neighbor estimate True high freqs. Input low freqs. Estimated high freqs. high freqs. ... ... low freqs. Training data samples (magnified)

  52. Nearest neighbor estimate Input low freqs. Estimated high freqs. high freqs. ... ... low freqs. Training data samples (magnified)

  53. Example: input image patch, and closest matches from database Input patch Closest image patches from database Corresponding high-resolution patches from database

  54. Scene-scene compatibility function, Ψ ( x i , x j ) Assume overlapped regions, d, of hi-res. patches differ by Gaussian observation noise: Uniqueness constraint, not smoothness. d

  55. y Image-scene compatibility function, Φ ( x i , y i ) x Assume Gaussian noise takes you from observed image patch to synthetic sample:

  56. Markov network image patches Φ ( x i , y i ) scene patches Ψ ( x i , x j )

  57. Belief Propagation After a few iterations of belief propagation, the algorithm selects spatially consistent high resolution interpretations for each low-resolution patch of the Input input image. Iter. 0 Iter. 1 Iter. 3

  58. Zooming 2 octaves We apply the super-resolution algorithm recursively, zooming up 2 powers of 2, or a factor of 4 in each dimension. 85 x 51 input Cubic spline zoom to 340x204 Max. likelihood zoom to 340x204

  59. Now we examine the effect of the prior assumptions made about images on the high resolution reconstruction. First, cubic spline interpolation. Original (cubic spline implies thin 50x58 plate prior) True 200x232

  60. Original (cubic spline implies thin 50x58 plate prior) True Cubic spline 200x232

  61. Next, train the Markov network algorithm on a world of random noise images. Original 50x58 Training images True

  62. The algorithm learns that, in such a world, we add random noise when zoom to a higher resolution. Original 50x58 Training images Markov True network

  63. Next, train on a world of vertically oriented rectangles. Original 50x58 Training images True

  64. The Markov network algorithm hallucinates those vertical rectangles that it was trained on. Original 50x58 Training images Markov True network

  65. Now train on a generic collection of images. Original 50x58 Training images True

  66. The algorithm makes a reasonable guess at the high resolution image, based on its training images. Original 50x58 Training images Markov True network

  67. Generic training images Next, train on a generic set of training images. Using the same camera as for the test image, but a random collection of photographs.

  68. Original Cubic 70x70 Spline Markov net, True training: 280x280 generic

  69. Kodak Imaging Science Technology Lab test. 3 test images, 640x480, to be zoomed up by 4 in each dimension. 8 judges, making 2-alternative, forced-choice comparisons.

  70. Algorithms compared • Bicubic Interpolation • Mitra's Directional Filter • Fuzzy Logic Filter •Vector Quantization • VISTA

  71. VISTA Altamira Bicubic spline

  72. VISTA Altamira Bicubic spline

  73. User preference test results “The observer data indicates that six of the observers ranked Freeman’s algorithm as the most preferred of the five tested algorithms. However the other two observers rank Freeman’s algorithm as the least preferred of all the algorithms…. Freeman’s algorithm produces prints which are by far the sharpest out of the five algorithms. However, this sharpness comes at a price of artifacts (spurious detail that is not present in the original scene). Apparently the two observers who did not prefer Freeman’s algorithm had strong objections to the artifacts. The other observers apparently placed high priority on the high level of sharpness in the images created by Freeman’s algorithm.”

  74. Training images

Recommend


More recommend