Vers un apprentissage subquadratique pour les m elanges darbres F. - PowerPoint PPT Presentation

Vers un apprentissage subquadratique pour les m´ elanges d’arbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Universit´ e deLi` ege 2 Universit´ e de Nantes 10 mai 2010 F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 1 / 19

The goal of this research is to improve the learning of bayesian networks in high-dimensional problems. This has great potential in many applications : Bioinformatics Power networks F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 2 / 19

Motivation 1 Algorithms 2 Experiments 3 Conclusion 4 F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 3 / 19

The choice of the structure search space is a compromise. Sets of all bayesian networks Ability to model any density Superexponential number of structures ⇒ Structure learning is difficult ⇒ Overfitting Inference is difficult Sets of simpler structures Reduced modeling power Learning and inference potentially easier A tree is a graph without cycle where each variable has at most one parent. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 4 / 19

Mixtures of trees combine qualities of bayesian networks and trees. A forest is a tree missing edges : A mixture of trees is an ensemble method : m � P MT ( x ) = w i P T i ( x ) i =1 F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 5 / 19

Mixtures of trees combine qualities of bayesian networks and trees. Several models → large modeling power Simple models → low complexity : ◮ inference is linear, ◮ learning : most algorithms are quadratic. Quadratic complexity could be too high for very large problems. In this work, we try to decrease it. Learning with mixtures of Trees, M. Meila & M.I. Jordan, JMLR 2001. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 5 / 19

Quadratic scaling is due to the Chow-Liu algorithm. Maximize data likelihood Composed of 2 steps : ◮ Construction of a complete graph whose edge-weight are empirical mutual informations ( O ( n 2 N )) ◮ Computation of the maximum width spanning tree ( O ( n 2 log n )) Approximating discrete probability distributions with dependence trees, C. Chow & C. Liu, IEEE Trans. Inf. Theory 1968. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 6 / 19

We propose to consider a random fraction δ of the edges of the complete graph. No longer optimal Reduction in complexity (for each term) : ◮ Construction of an uncomplete graph : O ( δ n 2 N ) ◮ Computation of the maximum width spanning tree ( O ( δ n 2 log n )) F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 7 / 19

Intuitively, the structure of the problem can be exploited to improve random sampling. In an euclidian space, similar problems can be approximated by sub-quadratic algorithms. When 2 points B and C are close to A, they are likely to be close as well. d ( B , C ) � d ( A , B ) + d ( A , C ) Mutual information is not an euclidian distance. However the same reasoning can be applied. If the pairs A ;B and A ;C have high mutual information, I(B ;C) may be high as well. I ( B ; C ) � I ( A ; B ) + I ( A ; C ) − H ( A ) F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 8 / 19

We want to obtain knowledge about the structure. The algorithm aims at building : a set of clusters on the variables, relationships between these clusters, and then exploit it to target interesting edges. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 9 / 19

We build the clusters iteratively : A center ( X 5 ) is randomly chosen and compared to the 12 other variables. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

We build the clusters iteratively : First cluster is created : it is composed of 5 members and 1 neighbour. Variables are assigned to a cluster based on two thresholds and their empirical mutual information with the center of the cluster. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

We build the clusters iteratively : The second cluster is built around X 13 , the variable the furthest away from X 5 . It is only compared to the 7 remaining variables. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

We build the clusters iteratively : After 4 iterations, all variables belong to a cluster, the algorithm stops. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

We build the clusters iteratively : Computation of mutual information among variables belonging to the same cluster. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

We build the clusters iteratively : Computation of mutual information between variables belonging to neighboring clusters. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 10 / 19

Motivation 1 Algorithms 2 Experiments 3 Conclusion 4 F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 11 / 19

Our algorithms were compared against two similar methods. Complexity reduction : Variance reduction : Bagging ( O ( n 2 log n )). Random tree sampling ( O ( n )), no connection to the data set. Probability Density Estimation by Perturbing and Combining Tree Structured Markov Networks, S. Ammar and al. ECSQARU 2009. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 12 / 19

Experimental settings Tests were conducted on synthetic binary problems : 1000 variables, Average on 10 target distributions × 10 data sets, Targets were generated randomly. Accuracy evaluation : Kullback-Leibler divergence is too computationally expensive : P t ( x ) log P t ( x ) � D KL ( P t || P l ) = P l ( x ) . x → Monte carlo estimation : log P t ( x ) ˆ � D KL ( P t || P l ) = P l ( x ) . x ∼ P t F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 13 / 19

The proposed algorithm succeeds in improving the random strategy. Edges similar to the MWST for single trees of 200 variables : F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 14 / 19

Variation of the proportion of edges selected Results for a mixture of size 100 : Random edge sampling is : ◮ better than the optimal tree for small data sets, ◮ worse for bigger sets, The more edges considered, the closer to the optimal tree. 60%, 35%, 20%, 5% ( ⊲ , ♦ , △ , � ) F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 15 / 19

The more terms in the mixture, the better the performance 300 samples : More sophisticated methods tend to converge slower, Random trees are always worse than an optimal tree, Other mixtures outperform CL tree. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 16 / 19

The fewer samples, the (relatively) better the randomized methods. For high-dimensional problems, data sets will be small. Results for a mixture of size 100 : Random trees ( � ) are better when samples are few, Bagging (-) is better for N > 50, Clever edge targeting ( ▽ ) is always better than random edge sampling ( ⋄ ). F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 17 / 19

Methods can also be mixed : A combination ( ⊳ ) of bagging (-) and random edge sampling ( ⋄ , 35%) : Performance lies between base methods. Improve bagging complexity. The fewer the sample, the closer to bagging. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 18 / 19

Conclusion Our results on randomized mixture of trees : Accuracy loss is in line with the gain in complexity. The interest of randomization increases when the sample size decreases. Clever strategies improve results without hurting complexity → Worth developing. Future work : Experiment other strategies, Include and test those improvements in other algorithms for building MT. F. Schnitzler (ULG) Sub-quadratic Mixtures of Trees JFRB 2010 19 / 19

Significance of the curves

Computation time Rand. trees Rand. edge sampling Clever edge sampling Bagging 2,063 s 64,569 s 59,687 s 168,703 s Table : Training CPU times, cumulated on 100 data sets of 1000 samples (MacOS X ; Intel dual 2 GHz ; 4GB DDR3 ; GCC 4.0.1)

H ( B , C , A ) � H ( B , C ) H ( A ) + H ( B | A ) + H ( C | AB ) � H ( B , C ) H ( A ) + ( B | A ) + H ( C | A ) H ( B , C ) � H ( B ) + H ( C ) + 2 H ( A ) � H ( B , C ) + H ( B ) + H ( B | A ) + H ( C | A ) + H ( C ) + H ( A ) H ( B ) + H ( C ) − H ( B , C ) � H ( B ) + H ( A ) − H ( B , A ) + H ( C ) + H ( A ) − H ( C , A ) − H ( A ) I ( B ; C ) � I ( A ; B ) + I ( A ; C ) − H ( A )

Vers un apprentissage subquadratique pour les m elanges darbres F. - PowerPoint PPT Presentation

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Universit e deLi` ege 2 Universit e de Nantes 10 mai 2010 F. Schnitzler (ULG) Sub-quadratic Mixtures

Ph Philip ip Br Brown wn Cardif rdiff f Unive vers rsity ty SRHE Conference-7 December

Adaptive and Localized Basis Functions for O ( N ) vers. Linear Scaling, Large Systems and

ISO TC67 WG10 ISO TC252 WG2 Normalisation internationale pour les installations et

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

Inheritance : Is - a Vers u s Has - A W OR K IN G W ITH TH E C L ASS SYSTE M IN P YTH ON Vicki

Measuring the Progress of Societies: some alternative measures of wellbeing Jon Hall OECD July

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

Descripteurs divers niveaux de concepts pour la classification concepts pour la classification

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation

Ca Case Western rn Re Reserve rve Un Unive vers rsity The Th e La Laura and Alvin S

Case Western Ca rn Re Reserve rve Un Unive vers rsity Th The e La Laura and Alvin S

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

Vers une vision intgre de la protection de lair et du climat en Wallonie : Introduction

COMPELLING COMMUNICATIONS 1 COM PELLING COM M UNICATIONS | Uni vers ity Mark eting ABOUT US

Vers des mcanismes gnriques de communication et une meilleure matrise des affinits dans

Manar Hajeer, , MD, FRCPath th Unive vers rsity ity of Jordan an , school ol of medicine

Vers une nouvelle dfinition de lischmie myocardique Genevive Derumeaux Hpital Louis

Vers un apprentissage subquadratique pour les m elanges darbres F. - PowerPoint PPT Presentation

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Universit e deLi` ege 2 Universit e de Nantes 10 mai 2010 F. Schnitzler (ULG) Sub-quadratic Mixtures

Ph Philip ip Br Brown wn Cardif rdiff f Unive vers rsity ty SRHE Conference-7 December

Adaptive and Localized Basis Functions for O ( N ) vers. Linear Scaling, Large Systems and

ISO TC67 WG10 ISO TC252 WG2 Normalisation internationale pour les installations et

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

Inheritance : Is - a Vers u s Has - A W OR K IN G W ITH TH E C L ASS SYSTE M IN P YTH ON Vicki

Measuring the Progress of Societies: some alternative measures of wellbeing Jon Hall OECD July

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation

T A O Themes Apprentissage &amp; Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

I A O Inference, Apprentissage &amp; Optimisation Head: Michele Sebag Joint INRIA project

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage &amp; Optimization

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &amp;

Descripteurs divers niveaux de concepts pour la classification concepts pour la classification

A &amp; O Apprentissage &amp; Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation

Ca Case Western rn Re Reserve rve Un Unive vers rsity The Th e La Laura and Alvin S

Case Western Ca rn Re Reserve rve Un Unive vers rsity Th The e La Laura and Alvin S

Master Recherche IAC Apprentissage Statistique, Optimisation &amp; Applications Anne Auger

Vers une vision intgre de la protection de lair et du climat en Wallonie : Introduction

COMPELLING COMMUNICATIONS 1 COM PELLING COM M UNICATIONS | Uni vers ity Mark eting ABOUT US

Vers des mcanismes gnriques de communication et une meilleure matrise des affinits dans

Manar Hajeer, , MD, FRCPath th Unive vers rsity ity of Jordan an , school ol of medicine

Vers une nouvelle dfinition de lischmie myocardique Genevive Derumeaux Hpital Louis

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger