Learning Polytrees with Constant Number of Roots from Data Jan Manuch 1,2 , Javad Safaei 1 , Ladislav Stacho 2 1. University of British Columbia, Department of Computer Science 2. Simon Fraser University, Department of Mathematics
Introduction • The goal is to learn a probabilistic Graphical Model (Direct Acyclic Graph or DAG) given a dataset that optimizes an objective function. • Types of objective functions: ▫ Bayesian Score ▫ Maximum Likelihood ( ML ) Score • Chickering (1996) [1] has shown that learning optimal Bayesian DAGs is NP-complete. Similarly, learning minimal ML DAGs is NP-complete. • Minimal ML DAG is a ML DAG with minimum number of edges.
Data Set • Data is a set of m vectors ( D j, 1 ≤ j ≤ m ) . Vector X 1 X 2 X 3 X 4 1 0 1 1 0 2 0 1 1 0 • Each vector X has fixed number n of features ( X i , 1 ≤ i ≤ n ). 3 1 1 0 0 4 1 1 0 0 5 1 1 1 1 • Each feature can take different 6 0 0 1 1 values val( X i )={v 1 , v 2 , … , v mi } . 7 1 1 1 0 8 1 0 1 1 • The value of the � - �� vector and 9 0 1 0 1 � - �� feature will be denoted as 10 0 1 1 1 � �, � . 11 1 1 1 0 � • ML Score of D and DAG � 12 1 0 0 1 � 13 0 0 0 0 � = � � � � � � � � � 14 0 1 1 0 15 0 0 0 0 ��� 16 0 1 0 1 • Example: m=16, n=4, m i =2
Learning Tree Structures • Chow and Liu (1968) [2] has shown that learning ML trees is polynomial and can be computed in � � � (� + log �) . 1. Compute mutual information of every two vertices 2. Find the MST (maximum spanning tree) using MI as weights 3. Pick any vertex as the root and orient the edges ( it can be shown that the choice does not affect ML score ) • Definition. Polytrees are directed graphs with no undirected loops. • Dasgupta (1999) [3] showed that learning ML polytrees from data is NP-complete. • We study finding ML polytrees with constant number of roots.
Factorization � • Definition . The probability for every input vector � � given a DAG � is defined as: � � = � �(� � = � �,� |! " = # �,� ) � � � � , ��� � , and # �,� all of their where ! " is a set of all parent nodes of � � in � values in vector � � . � is also called factorized form of distribution � with respect • � � � � � . to � • � itself is called empirical distribution and is computed from data: + ∑ 〈� �,� = ' 〉 ��� � � � = ' = �
Merging nodes and edges • Definition . Vertices having more than one parent in a DAG are called merging nodes , and merging edges are all incoming edges to merging nodes. • Proposition 1 (Verma and Pearl 1990 [4]) . Two DAGs with the same skeleton and merging edges, factorize a distributions similarly, i.e., . /01 23 � If �̅ = �′ � = 23(�′) then � � � � = � � �′ , � is a collection of all merging edges. where 23 � • Proposition 1 helps us to avoid enumerating all orientations of edges.
Learning Polytrees Algorithm • Proposition 2 . In a polytree with 4 > 1 number of roots the following properties hold: : 2 ≤ |8 ℓ | ≤ 4, ∑ |8 ℓ | = 4 + ; − 1 , ℓ�� where ; is the total number of merging nodes, and |8 ℓ | = = ℓ + 2 is the number of parents of the ℓ - �� merging node. • Algorithm for > -root polytree: 1. Generate a set of merging edges respecting Proposition 2. 2. For each selection of merging edges, run MST algorithm but we do not allow components contain more than one merging node. 3. Pick any orientation of undirected edges that does not introduce any new merging node (ok by Proposition 1).
Example ◮ Pick a selection of merging edges ( n = 7, k = 3)
Example ◮ Run modified MST algorithm
Example ◮ Run modified MST algorithm
Example ◮ Run modified MST algorithm
Example ◮ Run modified MST algorithm
Example ◮ Run modified MST algorithm
Example ◮ Run modified MST algorithm
Example ◮ Orient edges in components containing merging nodes (merging nodes is the root)
Example ◮ Orient edges in other components (roots in each component can be picked arbitrarily)
Counting selections of merging edges • Let ?(�, 4) be total number of selections of merging edges in polytrees with � nodes and 4 roots: ? �, 4 = � �@� �@� �@� G@� ∑ ∑ … :�� A B CA D C⋯CA F �G@:@� : A B C� A D C� A H C� G@� I : 4 − ; − 1, ; � �:CG@� ≤ ∑ (by Proposition 2) :�� G@� ≤ � GC� J(� � ) :@� 4 − 2 � GC� 1 +� � G@� ∈ �(� LG@L ) = ; − 1 :��
Algorithm’s Complexity • Enumerates ?(�, 4) ∈ �(� LG@L ) merging edge selections, and for each spends � � + � + �� time for edge completion, orientation assignment, and likelihood computation. Hence, the total complexity of our algorithm is M N0 O>@P . • Gasper and et al. [5] introduced 4 -branchings as polytrees that by removing 4 arcs transform to directed forests, and provided an algorithm for learning k -branching working in time M(N0 O>CQ ) Proposition . 4 -branching is equivalent with learning • polytrees with up to 4 + 1 roots. Our algorithm is by �(� L ) factor faster than the algorithm of Gasper and et al. [5].
Experiment: Identification of phosphorylation sites • Peptides are shorts sequences of amino acids. We consider peptides of length 9 centered at a phosphorylation site (Serine, Threonine, Tyrosine) which is phosphorylated by protein kinases. • Two different peptides datasets is used ▫ 803 peptides that are phosphorylated by protein kinase PKC ▫ 1000 randomly selected peptides that are phosphorylated by some kinase • We learn the maximum likelihood polytrees of two and three roots.
Results Peptides of PKC 1000 Random Peptides Algorithm # Trees # Trees Score Time Score Time tested tested MST(1 root) = tree -19.15 0.15 1 -21.47 0.04 1 Heuristic: MST+(2 roots) -18.14 1.07 9 -20.40 1.13 8 MST+(3 roots) -17.26 2.86 23 -19.35 2.77 18 Exact: 2 roots -18.02 27.47 252 -20.37 35.98 252 3 roots -16.97 2551.50 23184 -19.30 3235.38 23184 PKC peptides have higher avg. likelihood score than random peptides as it is • expected and they are more convergent. The higher number of roots, the better likelihood score as expected. •
Application?: Predicting peptides structure • If we assume that connected nodes in the learned polytree are close positions in the peptides in 3D structure, then we could get some information about 3D structure of the peptides: Tree- Structure Tree-Structure learned by MST learned by 3- root polytree
Conclusions • We presented � �� LG@� algorithm for learning polytrees with n roots and k roots, which improves the algorithm of Gasper and et al. [5] by �(� L ) factor. • Applied this algorithm to predicting peptides that are phosphorylated (or phosphorylated by a particular kinase). • Is there an FPT algorithm for this problem?
References • [1] Chickering, D.M.: Learning Bayesian networks is NP-complete. In: Learning from data, pp. 121–130. Springer (1996) • [2] Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14(3), 462– 467 (1968) • [3] Dasgupta, S.: Learning polytrees. In: Uncertainty in Artificial Intelligence, pp. 134–141 (1999) • [4] Verma, T.S., Pearl, J.: Equivalence and synthesis of causal models. In: Uncertainty in Artificial Intelligence (UAI). pp. 220–227 (1990) • [5] Gaspers, S., Koivisto, M., Liedloff, M., Ordyniak, S., Szeider, S.: On finding optimal polytrees. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)
Thank you • Any questions?
Recommend
More recommend