Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. Sutton-Charani Artificial intelligence Decision trees 1 / 47
1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 2 / 47
Introduction What is a decision tree ? label label prediction prediction label label label attribute J 4 prediction prediction prediction attribute J 3 attribute J 2 attribute J 1 N. Sutton-Charani Artificial intelligence Decision trees 3 / 47
Introduction What is a decision tree ? attribute J 1 attribute J 2 attribute J 3 label label label attribute J 4 prediction prediction prediction label label prediction prediction N. Sutton-Charani Artificial intelligence Decision trees 4 / 47
Introduction What is a decision tree ? → supervised learning attribute J 1 values values attribute J 2 attribute J 3 values values values values label label label attribute J 4 prediction prediction prediction values values label label prediction prediction N. Sutton-Charani Artificial intelligence Decision trees 5 / 47
Introduction A little history △ machine learning (or data mining) decision trees ! � = decision theory decision trees N. Sutton-Charani Artificial intelligence Decision trees 6 / 47
Introduction Types of decision trees type of class label numerical → regression tree nominal → classification tree type of algorithm ( → structure) CART : statistics, binary tree C4.5 : computer science, small tree N. Sutton-Charani Artificial intelligence Decision trees 7 / 47
Use of decision trees Prediction Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 8 / 47
Use of decision trees Prediction Classification trees Will the badminton match take place ? N. Sutton-Charani Artificial intelligence Decision trees 9 / 47
Use of decision trees Prediction Classification trees What fruit is it ? N. Sutton-Charani Artificial intelligence Decision trees 10 / 47
Use of decision trees Prediction Classification trees What he/she come to my party ? N. Sutton-Charani Artificial intelligence Decision trees 11 / 47
Use of decision trees Prediction Classification trees Will they wait ? N. Sutton-Charani Artificial intelligence Decision trees 12 / 47
Use of decision trees Prediction Classification trees Who will win he election in this county ? N. Sutton-Charani Artificial intelligence Decision trees 13 / 47
Use of decision trees Prediction Regression trees What grade will a student get (given his homework average grade) ? N. Sutton-Charani Artificial intelligence Decision trees 14 / 47
Use of decision trees Interpretability : Descriptive data analysis Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 15 / 47
Use of decision trees Interpretability : Descriptive data analysis Data analysis tool Trees are very interpretable : attributes spaces partitioning → a tree can be resumed by its leaves which define a law mixture → wonderful collaboration tool with experts △ INSTABILITY ← overfitting ! N. Sutton-Charani Artificial intelligence Decision trees 16 / 47
Learning of decision trees Formalism Learning dataset (supervised learning) x 1 x J x 1 , y 1 y 1 . . . 1 1 . . . . . . . . = samples are assumed to be i.i.d . . . . x 1 x J x N , y N y N . . . N N Attributes X = ( X 1 , . . . , X J ) ∈ X = X 1 × · · · × X J Spaces X j can be categorical or numerical Class label Y ∈ Ω = { ω 1 , . . . , ω K } ( ∈ R K for regression) Tree and π h = P ( t h ) ≈ | t h | P H = { t 1 , . . . , t H } with | t h | = # { i : x i ∈ t h } N N. Sutton-Charani Artificial intelligence Decision trees 17 / 47
Learning of decision trees Recursive partitioning N. Sutton-Charani Artificial intelligence Decision trees 18 / 47
Learning of decision trees Recursive partitioning N. Sutton-Charani Artificial intelligence Decision trees 19 / 47
Learning of decision trees Recursive partitioning N. Sutton-Charani Artificial intelligence Decision trees 20 / 47
Learning of decision trees Recursive partitioning N. Sutton-Charani Artificial intelligence Decision trees 21 / 47
Learning of decision trees Learning principle Start with all the dataset in the initial node Chose the best splits (on attributes) in order to get pure leaves Classification trees purity = homogeneity in term of class labels K CART → Gini impurity : i ( t h ) = � p k (1 − p k ) k =1 whith p k = P ( Y = ω k | t h ) K ID3, C4.5 → Shanon entropy : i ( t h ) = − � p k log 2 ( p k ) k =1 Regression trees purity = low variance of class labels � � E ( Y | t h )) 2 with � → i ( t h ) = � ( y i − � 1 1 Var ( Y | t h ) = E ( Y | t h ) = y i | t h | | t h | x i ∈ t h x i ∈ t h N. Sutton-Charani Artificial intelligence Decision trees 22 / 47
Learning of decision trees Impurity measures N. Sutton-Charani Artificial intelligence Decision trees 23 / 47
Learning of decision trees Purity criteria Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 24 / 47
Learning of decision trees Purity criteria Purity criteria leaf to split t h Impurity measure + tree structure → criteria CART, ID3 : purity gain C4.5 : information gain ratio Regression trees CART : Variance minimisation N. Sutton-Charani Artificial intelligence Decision trees 25 / 47
Learning of decision trees Purity criteria Purity criteria attribute ? t h values ? values ? prediction ? prediction ? t L t R Impurity measure + tree structure → criteria CART, ID3 : purity gain → ∆ i = i ( t h ) − π L i ( t L ) − π R i ( t R ) ∆ i C4.5 : information gain ratio → IGR = H ( π L ,π R ) Regression trees CART : Variance minimisation → ∆ i = i ( t h ) − π L i ( t L ) − π R i ( t R ) N. Sutton-Charani Artificial intelligence Decision trees 26 / 47
Learning of decision trees Stopping criteria Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 27 / 47
Learning of decision trees Stopping criteria Stopping criteria (pre-pruning) For all leaves { t h } h =1 ,..., H and their potential children : leaves purity : ∃ k ∈ { 1 , . . . , K } : p k = 1 leaves and children sizes : | t h | ≤ minLeafSize leaves and children weights : π h = | t h | t 0 ≤ minLeafProba leaves number : H ≥ maxNumberLeaves tree depth : depth ( P H ) ≥ maxDepth purity gain : ∆ i ≤ minPurityGain N. Sutton-Charani Artificial intelligence Decision trees 28 / 47
Learning of decision trees Learning algorithm Plan 1. Introduction 2. Use of decision trees 2.1 Prediction 2.2 Interpretability : Descriptive data analysis 3. Learning of decision trees 3.1 Purity criteria 3.2 Stopping criteria 3.3 Learning algorithm 4. Pruning of decision trees 4.1 Cost-complexity trade-off 5. Extension : Random forest N. Sutton-Charani Artificial intelligence Decision trees 29 / 47
Learning of decision trees Learning algorithm Learning algorithm Result: Learnt tree Start with all the learning data in an initial node (single leaf); while Stopping criteria not verified for all leaves do for each splitable leaf do compute the purity gains obtained from all possible split; end SPLIT : select the split achieving the maximum purity gain; end prune the obtained tree; Recursive partitioning N. Sutton-Charani Artificial intelligence Decision trees 30 / 47
Learning of decision trees Learning algorithm ID3 - Training Examples – [9+,5-] N. Sutton-Charani Artificial intelligence Decision trees 31 / 47
Learning of decision trees Learning algorithm ID3 - Selecting Next Attribute N. Sutton-Charani Artificial intelligence Decision trees 32 / 47
Learning of decision trees Learning algorithm ID3 - Selecting Next Attribute N. Sutton-Charani Artificial intelligence Decision trees 33 / 47
Recommend
More recommend