Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works Nir Friedman and Daphne Koller 04/ 21/ 2005 CS673 1
Roadmap Roadmap • Bayesian lear ning of Bayesian Net wor ks – Exact vs Approximat e Learning • Markov Chain Mont e Carlo met hod – MCMC over st ruct ures – MCMC over orderings • Experiment al Result s • Conclusions 04/ 21/ 2005 CS673 2
Bayesian Net works Bayesian Net works • Compact represent at ion of probabilit y dist ribut ions via condit ional independence E B E B P (A| E,B) Qualit at ive part : e b 0.9 0.1 Direct ed acyclic graph-DAG e !b 0.2 0.8 R • Nodes – random variables A !e b 0.9 0.1 !e !b 0.01 0.99 • Edges – direct inf luence C Quant it at ive part : Set of condit ional Toget her : probabilit y dist ribut ion Def ine a unique dist ribut ion in a f act ored f orm P(B,E,A,C,R) =P(B)P(E)P(A| B,E)P(R| E)P(C| A) 04/ 21/ 2005 CS673 3
Why Learn Bayesian Net works? Why Learn Bayesian Net works? • Condit ional independencies & graphical represent at ion capt ure t he st ruct ure of many real-world dist ribut ions - P rovides insight s int o domain • Graph st ruct ure allows “ knowledge discovery” • I s t here a direct connect ion bet ween X & Y • Does X separat e bet ween t wo “ subsyst ems” • Does X causally af f ect Y • Bayesian Net works can be used f or many t asks – I nf erence, causalit y, et c. • Examples: scient if ic dat a mining - Disease propert ies and sympt oms - I nt eract ions bet ween t he expression of genes 04/ 21/ 2005 CS673 4
Learning Bayesian Net works Learning Bayesian Net works E B Inducer Data + Prior Information R A E B P (A| E,B) e b 0.9 0.1 C e !b 0.2 0.8 !e b 0.9 0.1 !e !b 0.01 0.99 •I nducer needs t he prior probabilit y dist ribut ion P( I nducer needs t he prior probabilit y dist ribut ion P( B B ) ) • Using Bayesian condit ioning, updat e t he prior Using Bayesian condit ioning, updat e t he prior P( B B ) ) P( B B | D) | D) P( P( 04/ 21/ 2005 CS673 5
Why St ruggle f or Accurat e St ruct ure? Why St ruggle f or Accurat e St ruct ure? “Tr ue” st r uct ur e Tr ue” st r uct ur e E A B S Adding an ar c Adding an ar c Missing an ar c Missing an ar c E A B E A B S S •I ncr eases t he number of I ncr eases t he number of •Cannot be compensat ed by Cannot be compensat ed by par amet er s t o be f it t ed par amet er s t o be f it t ed accur at e f it t ing of par amet er s accur at e f it t ing of par amet er s Wr ong assumpt ions about Wr ong assumpt ions about Also misses causalit y and domain Also misses causalit y and domain causalit y and domain st r uct ur e causalit y and domain st r uct ur e st r uct ur e st r uct ur e 04/ 21/ 2005 CS673 6
Score- -based learning based learning Score • Def ine scoring f unct ion t hat evaluat es how well a st ruct ure mat ches t he dat a E, B, A < Y,N,N> < Y,Y,Y> < N,Y,Y> . E E B E . A < N,N,N> A A B B • Search f or a st ruct ure t hat maximizes t he score 04/ 21/ 2005 CS673 7
Bayesian Score of a Model Bayesian Score of a Model ( | ) ( ) P D G P G = ( | ) P G D ( ) P D where where ∫ = θ θ θ ( | ) ( | , ) ( | ) P D G P D G P G d Marginal Likelihood Prior over parameters Likelihood 04/ 21/ 2005 CS673 8
Discovering St ruct ure – – Model Select ion Model Select ion Discovering St ruct ure P(G| D) P(G| D) E B R A C •Current pract ice: model select ion Current pract ice: model select ion • P P ick a single high- ick a single high -scoring model scoring model Use t hat model t o inf er domain st ruct ure Use t hat model t o inf er domain st ruct ure 04/ 21/ 2005 CS673 9
Discovering St ruct ure – – Model Averaging Model Averaging Discovering St ruct ure P(G| D) P(G| D) E B E B E B E B E B R R R R A A A A R A C C C C C • Pr oblem • Pr oblem ⇒ Small sample size Small sample size many high scoring models many high scoring models Answer based on one model of t en useless Answer based on one model of t en useless Want f eat ures common t o many models Want f eat ures common t o many models 04/ 21/ 2005 CS673 10
Bayesian Approach Bayesian Approach • Est imat e probabilit y of f eatures – Edge X � Y Bayesian score for G – Markov edge X -- Y – Pat h X � … � Y – ... ∑ = ( | ) ( ) ( | ) P f D f G P G D G Feature of G, e.g., X � Y Indicator function for feature f Huge (super-exponent ial – 2 T (n 2 ) ) number of net works G • • Exact learning - int ract able 04/ 21/ 2005 CS673 11
Approximat e Bayesian Learning Approximat e Bayesian Learning • Rest rict t he search space t o G k , where G k – set of graphs wit h indegree bounded by k -space st ill super-exponent ial • Find a set G of high scoring st ruct ures – Est imat e ∑ ( | ) ( ) P G D f G ≈ G ( | ) P f D ∑ ( | ) P G D G - Hill-climbing – biased sample of st ruct ures 04/ 21/ 2005 CS673 12
Markov Chain Mont e Carlo over Net works Markov Chain Mont e Carlo over Net works MCMC Sampling – Def ine Markov Chain over BNs – Perf orm a walk t hrough t he chain t o get samples G’s whose post eriors converge t o t he post erior P(G| D) of t he t rue st ruct ure • Possible pit f alls: – St ill super-exponent ial number of net works – Time f or chain t o converge t o post erior is unknown – I slands of high post erior, connect ed by low bridges 04/ 21/ 2005 CS673 13
Bet t er Approach t o Approximat e Learning Bet t er Approach t o Approximat e Learning • Furt her const raint s on t he search space – P erf orm model averaging over t he st ruct ures consist ent wit h some know (f ixed) t ot al ordering ‹ • Ordering of variables: – X 1 ‹ X 2 ‹… ‹ X n parent s f or X i must be in X 1 , X 2 ,… , X i-1 • I nt uit ion: Order decouples choice of parent s – Choice of P a(X 7 ) does not rest rict choice of P a(X 12 ) •Can comput e ef f icient ly in closed f orm Can comput e ef f icient ly in closed f orm • (D| ‹ ‹ ) ) Likelihood P Likelihood P (D| ‹ ) , ‹ Feat ure probabilit y P(f | D ) Feat ure probabilit y P(f | D, 04/ 21/ 2005 CS673 14
Sample Orderings Sample Orderings We can writ e ∑ = p p ( | ) ( | , ) ( | ) P f D P f D P D p Sample orderings and approximat e n ∑ = p ( | ) ( | , ) P f D P f i D = 1 i MCMC Sampling • Def ine Markov Chain over orderings Run chain t o get samples f rom post erior P( < • | D) 04/ 21/ 2005 CS673 15
Experiment s: Exact post erior over orders Experiment s: Exact post erior over orders versus order - -MCMC MCMC versus order 04/ 21/ 2005 CS673 16
Experiment s: Convergence Experiment s: Convergence 04/ 21/ 2005 CS673 17
Experiment s: st ruct ure- -MCMC MCMC – – post erior post erior Experiment s: st ruct ure correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs 04/ 21/ 2005 CS673 18
Experiment s: order - -MCMC MCMC – – post erior post erior Experiment s: order correlat ion f or t wo dif f erent runs correlat ion f or t wo dif f erent runs 04/ 21/ 2005 CS673 19
Conclusion Conclusion • Or der -MCMC bet t er t han st r uct ur e-MCMC 04/ 21/ 2005 CS673 20
Ref erences Ref erences Being Bayesian about Net work St ruct ure: A Bayesian Approach t o St ruct ure Discovery in Bayesian Net works, N. Friedman and D. Koller. Machine Learning J ournal, 2002 NI P S 2001 Tut orial on learning Bayesian net works f rom Dat a. Nir Friedman and Daphne Koller Nir Friedman and Moises Goldzsmidt, AAAI -98 Tut orial on learning Bayesian net works f rom Dat a. D. Hecker man. A Tut or ial on Lear ning wit h Bayesian Net wor ks. I n Lear ning in Gr aphical Models, M. J or dan, ed.. MI T Pr ess, Cambr idge, MA, 1999. Also appear s as Technical Repor t MSR-TR-95-06, Micr osof t Resear ch, Mar ch, 1995. An ear lier ver sion appear s as Bayesian Net wor ks f or Dat a Mining, Dat a Mining and Knowledge Discover y , 1:79-119, 1997. Christ ophe Andrieu, Nando de Freit as, Arnaud Doucet and Michael I . J ordan. An I ntroduction to MCMC f or Machine Learning. Machine Learning, 2002. Art if icial I nt elligence: A Modern Approach. St uart Russell and P et er Norvig 04/ 21/ 2005 CS673 21
Recommend
More recommend