class logistics
play

Class logistics Exam results back today. This Thursday, your - PDF document

Class logistics Exam results back today. This Thursday, your project proposals are due. Feel free to ask Xiaoxu or me for feedback or ideas regarding the project. Auditors are welcome to do a project, and well read them and


  1. Class logistics • Exam results back today. • This Thursday, your project proposals are due. – Feel free to ask Xiaoxu or me for feedback or ideas regarding the project. – Auditors are welcome to do a project, and we’ll read them and give feedback. A medley of project ideas • Implement photographic vs photorealistic discrimination function. Generative Models • Read and compare 3 papers on a computer vision/machine learning topic that excites you. • Evaluate how well the “visual gist” (blurry texture representation) does at categorizing a large collection of images. • Implement and evaluate example-based super-resolution. • Implement and evaluate Soatto’s temporal texture model Bill Freeman, MIT (conceptually simple; neat results). • Digitize a bird book, and make SVM classifiers for owls, pelicans, eagles, etc. 6.869 March 29, 2005 • Make a broken glass detector. • Ask, and answer, what is the dimensionality of the manifold of image patches, of various sizes? • Digitize tree identification books, and develop a texture-based classifier that will categorize trees from their leaf/needle textures. P(a,b) = P(b|a) P(a) Making probability distributions modular, and By the chain rule, for any probability distribution, we have: therefore tractable: Probabilistic graphical models = P ( x , x , x , x , x ) P ( x ) P ( x , x , x , x | x ) 1 2 3 4 5 1 2 3 4 5 1 = ( ) ( | ) ( , , | , ) P x P x x P x x x x x 1 2 1 3 4 5 1 2 = Vision is a problem involving the interactions of many variables: ( ) ( | ) ( | , ) ( , | , , ) P x P x x P x x x P x x x x x 1 2 1 3 1 2 4 5 1 2 3 things can seem hopelessly complex. Everything is made = ( ) ( | ) ( | , ) ( | , , ) ( | , , , ) P x P x x P x x x P x x x x P x x x x x 1 2 1 3 1 2 4 1 2 3 5 1 2 3 4 tractable, or at least, simpler, if we modularize the problem. That’s what probabilistic graphical models do, and let’s examine But if we exploit the assumed modularity of the probability distribution over that. the 5 variables (in this case, the assumed Markov chain structure), then that expression simplifies: = P ( x ) P ( x | x ) P ( x | x ) P ( x | x ) P ( x | x ) 1 2 1 3 2 4 3 5 4 Readings: Jordan and Weiss intro article—fantastic! x x x x x Kevin Murphy web page—comprehensive and with 1 2 3 4 5 Now our marginalization summations distribute through those terms: pointers to many advanced topics ∑ ∑ ∑ ∑ ∑ ∑ = ( , , , , ) ( ) ( | ) ( | ) ( | ) ( | ) P x x x x x P x P x x P x x P x x P x x 1 2 3 4 5 1 2 1 3 2 4 3 5 4 , , , x x x x x x x x x 2 3 4 5 1 2 3 4 5 1

  2. Belief propagation Performing the marginalization by doing the partial sums is called “belief propagation”. Another modular probabilistic structure, more common in vision problems, is an undirected graph: ∑ ∑ ∑ ∑ ∑ ∑ = P ( x , x , x , x , x ) P ( x ) P ( x | x ) P ( x | x ) P ( x | x ) P ( x | x ) 1 2 3 4 5 1 2 1 3 2 4 3 5 4 x , x , x , x x x x x x x x x x x 2 3 4 5 1 2 3 4 5 1 2 3 4 5 In this example, it has saved us a lot of computation. Suppose each variable has 10 discrete states. Then, not knowing the special structure The joint probability for this graph is given by: of P, we would have to perform 10000 additions (10^4) to marginalize = Φ Φ Φ Φ P ( x , x , x , x , x ) ( x , x ) ( x , x ) ( x , x ) ( x , x ) over the four variables. 1 2 3 4 5 1 2 2 3 3 4 4 5 But doing the partial sums on the right hand side, we only need 40 additions (10*4) to perform the same marginalization! Φ Where is called a “compatibility function”. We can ( x 1 x , ) 2 define compatibility functions we result in the same joint probability as for the directed graph described in the previous slides; for that example, we could use either form. Markov Random Fields MRF nodes as pixels • Allows rich probabilistic models for images. • But built in a local, modular way. Learn local relationships, get global effects out. Winkler, 1995, p. 32 MRF nodes as patches Network joint probability image patches 1 ∏ ∏ = Ψ Φ ( , ) ( , ) ( , ) P x y x x x y i j i i Z scene patches , i j i image scene Scene-scene Image-scene Φ ( x i , y i ) compatibility compatibility image function function neighboring local Ψ ( x i , x j ) scene nodes observations scene 2

  3. In order to use MRFs: Outline of MRF section • Inference in MRF’s. • Given observations y, and the parameters of the MRF, how infer the hidden variables, x? – Gibbs sampling, simulated annealing – Iterated condtional modes (ICM) • How learn the parameters of the MRF? – Variational methods – Belief propagation – Graph cuts • Vision applications of inference in MRF’s. • Learning MRF parameters. – Iterative proportional fitting (IPF) Belief propagation messages Beliefs A message: can be thought of as a set of weights on each of your possible states To find a node’s beliefs: Multiply together all the messages coming in to that node. To send a message: Multiply together all the incoming messages, except from the node you’re sending to, then multiply by the compatibility matrix and marginalize ∏ over the sender’s states. = k ( ) ( ) b x M x j j j j j ∑ ∏ ∈ = ψ ( ) j k k N j ( ) ( , ) ( ) M x x x M x i i ij i j j j ∈ x k N ( j ) \ i j i j j i = Optimal solution in a chain or tree: Belief, and message updates Belief Propagation ∏ = k ( ) ( ) b x M x j • “Do the right thing” Bayesian algorithm. j j j j ∈ k N ( j ) • For Gaussian random variables over time: Kalman filter. • For hidden Markov models: forward/backward algorithm (and MAP ∑ ∏ = ψ j k ( ) ( , ) ( ) M x x x M x i i ij i j j j variant is Viterbi). ∈ x k N ( j ) \ i j i j i = 3

  4. Justification for running belief propagation No factorization with loops! in networks with loops • Experimental results: = Φ x mean ( x , y ) 1 MMSE 1 1 x – Error-correcting codes Kschischang and Frey, 1998; 1 Φ Ψ sum ( x , y ) ( x , x ) McEliece et al., 1998 2 2 1 2 x 2 Freeman and Pasztor, 1999; Φ Ψ Ψ sum ( , ) ( , ) – Vision applications x y x x ( , ) x x 3 3 2 3 Frey, 2000 1 3 x 3 • Theoretical results: y 2 – For Gaussian processes, means are correct. Weiss and Freeman, 1999 – Large neighborhood local maximum for MAP. y 3 y 1 Weiss and Freeman, 2000 x 2 – Equivalent to Bethe approx. in statistical physics. Yedidia, Freeman, and Weiss, 2000 x 3 x 1 – Tree-weighted reparameterization Wainwright, Willsky, Jaakkola, 2001 Statistical mechanics interpretation Free energy formulation U - TS = Free energy Defining − − Ψ = ( , ) / Φ = E x x T E ( x ) / T ( x , x ) e i j ( x ) e i ij i j i i ∑ P ( x 1 x , ,...) U = avg. energy = then the probability distribution p ( x , x ,...) E ( x , x ,...) 2 1 2 1 2 states T = temperature that minimizes the F.E. is precisely ∑ − p ( x , x ,...) ln p ( x , x ,...) S = entropy = the true probability of the Markov network, 1 2 1 2 states ∏ ∏ = Ψ Φ ( , ,...) ( , ) ( ) P x x x x x 1 2 ij i j i i ij i Mean field approximation to free energy Approximating the Free Energy [ ( , ,..., )] F p x x x Exact : 1 2 N F [ b i x ( )] Mean Field Theory : i U - TS = Free energy [ ( ), ( , )] F b x b x x Bethe Approximation : i i ij i j ∑∑ ∑∑ = + ( ) ( ) ( ) ( , ) ( ) ln ( ) F b b x b x E x x b x T b x Kikuchi Approximations : MeanField i i i j j ij i j i i i i ( ) , ij x x i x F [ b ( x ), b ( x , x ), b ( x x , x ),....] i j i , i i ij i j ijk i j k The variational free energy is, up to an additive constant, equal to the Kllback-Leibler divergence between b(x) and the true probability, P(x). ∏ KL divergence: b ( x ) i ∑ ∏ = ( || ) ( ) ln i D b P b x KL i ( ) P x x , 2 x ,... i 1 4

Recommend


More recommend