Journal of Articial In telligence Researc h 12 (2000) - PDF document

Journal of Arti�cial In telligence Researc h 12 (2000) 219-234 Submitted 5/99; published 5/00 Randomized Algorithms for the Lo op Cutset Problem Ann Bec k er anyut a@cs.technion.a c.il Reuv en Bar-Y eh uda reuven@cs.technion.a c.il Dan Geiger d ang@cs.technion.a c.il Computer Scienc e Dep artment T e chnion, Haifa, 32000, Isr ael Abstract W e sho w ho w to �nd a minim um w eigh t lo op cutset in a Ba y esian net w ork with high probabilit y . Finding suc h a lo op cutset is the �rst step in the metho d of conditioning for inference. Our randomized algorithm for �nding a lo op cutset outputs a minim um lo op k k 1 c 6 cutset after O ( c 6 k n ) steps with probabilit y at least 1 (1 ) , where c > 1 is a � � k 6 constan t sp eci�ed b y the user, k is the minima l size of a minim um w eigh t lo op cutset, and n is the n um b er of v ertices. W e also sho w empirically that a v arian t of this algorithm often �nds a lo op cutset that is closer to the minim um w eigh t lo op cutset than the ones found b y the b est deterministic algorithms kno wn. 1. In tro duction The metho d of conditioning is a w ell kno wn inference metho d for the computation of p os- terior probabilities in general Ba y esian net w orks (P earl, 1986, 1988; Suermondt & Co op er, 1990; P eot & Shac h ter, 1991) as w ell as for �nding MAP v alues and solving constrain t satisfaction problems (Dec h ter, 1999). This metho d has t w o conceptual phases. First to �nd an optimal or close to optimal lo op cutset and then to p erform a lik eliho o d computation for eac h instance of the v ariables in the lo op cutset. This metho d is routinely used b y geneticists via sev eral genetic link age programs (Ott, 1991; Lang, 1997; Bec k er, Geiger, & Sc ha�er, 1998). A v arian t of this metho d w as dev elop ed b y Lange and Elston (1975). Finding a minim um w eigh t lo op cutset is NP-complete and th us heuristic metho ds ha v e often b een applied to �nd a reasonable lo op cutset (Suermondt & Co op er, 1990). Most metho ds in the past had no guaran tee of p erformance and p erformed v ery badly when presen ted with an appropriate example. Bec k er and Geiger (1994, 1996) o�ered an algorithm that �nds a lo op cutset for whic h the logarithm of the state space is guaran teed to b e at most a constan t factor o� the optimal v alue. An adaptation of these appro ximation algorithms has b een included in v ersion 4.0 of F ASTLINK , a p opular soft w are for analyzing large p edigrees with small n um b er of genetic mark ers (Bec k er et al., 1998). Similar algorithms in the con text of undirected graphs are describ ed b y Bafna, Berman, and F ujito (1995) and F ujito (1996). While appro ximation algorithms for the lo op cutset problem are quite useful, it is still w orth while to in v est in �nding a minim um lo op cutset rather than an appro ximation b e- cause the cost of �nding suc h a lo op cutset is amortized o v er the man y iterations of the conditioning metho d. In fact, one ma y in v est an e�ort of complexit y exp onen tial in the size of the lo op cutset in �nding a minim um w eigh t lo op cutset b ecause the second phase of the conditioning algorithm, whic h is rep eated for man y iterations, uses a pro cedure of suc h c � 2000 AI Access F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed.

Becker, Bar-Yehud a, & Geiger complexit y . The same considerations apply also to constrain t satisfaction problems as w ell as other problems in whic h the metho d of conditioning is useful (Dec h ter, 1990, 1999). In this pap er w e describ e sev eral randomized algorithms that compute a lo op cutset. As done b y Bar-Y eh uda, Geiger, Naor, and Roth (1994), our solution is based on a reduction to the w eigh ted feedbac k v ertex set problem. A fe e db ack vertex set (FVS) F is a set of v ertices of an undirected graph G = ( V ; E ) suc h that b y remo ving F from G , along with all the edges inciden t with F , a set of trees is obtained. The Weighte d F e e db ack V ertex Set (WFVS) pr oblem is to �nd a feedbac k v ertex set F of a v ertex-w eigh ted graph with a w eigh t + function w : V I R , suc h that P w ( v ) is minimized. When w ( v ) 1, this problem is ! � v 2 F called the FVS problem. The decision v ersion asso ciated with the FVS problem is kno wn to b e NP-Complete (Garey & Johnson, 1979, pp. 191{192). Our randomized algorithm for �nding a WFVS, called Repea tedW GuessI , outputs a k k 1 c 6 minim um w eigh t FVS after O ( c 6 k n ) steps with probabilit y at least 1 (1 ) , where � � k 6 c > 1 is a constan t sp eci�ed b y the user, k is the minimal size of a minim um w eigh t FVS, and n is the n um b er of v ertices. F or un w eigh ted graphs w e presen t an algorithm that �nds k k 1 c 4 a minim um FVS of a graph G after O ( c 4 k n ) steps with probabilit y at least 1 (1 ) . � � k 4 In comparison, sev eral deterministic algorithms for �nding a minim um FVS are describ ed k 2 in the literature. One has a complexit y O ((2 k + 1) n ) (Do wney & F ello ws, 1995b) and 4 others ha v e a complexit y O ((17 k )! n ) (Bo dlaender, 1990; Do wney & F ello ws, 1995a). A �nal v arian t of our randomized algorithms, called WRA, has the b est p erformance b ecause it utilizes information from previous runs. This algorithm is harder to analyze and its in v estigation is mostly exp erimen tal. W e sho w empirically that the actual run time of WRA is comparable to a Mo di�ed Greedy Algorithm (MGA), describ ed b y Bec k er and Geiger (1996), whic h is the b est a v ailable deterministic algorithm for �nding close to optimal lo op cutsets, and y et, the output of WRA is often closer to the minim um w eigh t lo op cutest than the output of MGA. The rest of the pap er is organized as follo ws. In Section 2 w e outline the metho d of conditioning, explain the related lo op cutset problem and describ e the reduction from the lo op cutset problem to the WFVS Problem. In Section 3 w e presen t three randomized algorithms for the WFVS problem and their analysis. In Section 4 w e compare exp erimen tally WRA and MGA with resp ect to output qualit y and run time. 2. Bac kground: The Lo op Cutset Problem A short o v erview of the metho d of conditioning and de�nitions related to Ba y esian net w orks are giv en b elo w. See the b o ok b y P earl (1988) for more details. W e then de�ne the lo op cutset problem. Let P ( u ; : : : ; u ) b e a probabilit y distribution where eac h v ariable u has a �nite set n i 1 of p ossible v alues called the domain of u . A directed graph D with no directed cycles is i called a Bayesian network of P if there is a 1{1 mapping b et w een f u ; : : : ; u and v ertices g n 1 in D , suc h that u is asso ciated with v ertex i and P can b e written as follo ws: i n Y P ( u ; : : : ; u ) = P ( u u ; : : : ; u ) (1) j 1 n i i i 1 j ( i ) i =1 where i ; : : : ; i are the source v ertices of the incoming edges to v ertex i in D . j ( i ) 1 220

Journal of Articial In telligence Researc h 12 (2000) - PDF document

Journal of Articial In telligence Researc h 12 (2000) 219-234 Submitted 5/99; published 5/00 Randomized Algorithms for the Lo op Cutset Problem Ann Bec k er anyut a@cs.technion.a c.il Reuv en Bar-Y eh uda

Alte ternate te De Definiti tions (Ru Human inte telligence (Russell + Norv ssell + Norvig