Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke University University of British University Columbia
Decision Trees
Decision Trees
Decision Trees Should I click on the link in this email? Do I recognize the from address? Do the Can I see contents the URL seem for the odd? link?
Decision Trees Should I click on the link in this email? Can I see the URL for the link? Do the Do I contents recognize seem the from odd? address?
Why not just find the Best Tree?
Why not just find the Best Tree?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space?
Could we Effectively Search that Space? Optimal!
The Optimization Problem n L (tree,{( x i , y i )} i ) = 1 å 1 ˆ [tree( x i ) ¹ y i ] + C (#leaves in tree) n i = 1
The Optimization Problem n L (tree,{( x i , y i )} i ) = 1 å 1 ˆ [tree( x i ) ¹ y i ] + C (#leaves in tree) n i = 1 Misclassification error
The Optimization Problem n L (tree,{( x i , y i )} i ) = 1 å 1 ˆ [tree( x i ) ¹ y i ] + C (#leaves in tree) n i = 1 Misclassification error Sparsity
Optimal Sparse Decision Tree (Broward County Recidivism Data) Prior offenses > 3 no yes Age < 26 Predict Arrest no yes Predict No Arrest Prior Offenses > 1 no yes Any juvenile crimes? Predict Arrest no yes Predict No Arrest Predict Arrest
Optimal Sparse Decision Trees Branch and Bound Good Scheduling Order Strong Bounds Incremental Computation
Optimal Sparse Decision Trees Branch and Bound Good Scheduling Order Strong Bounds FAST Incremental Computation
Optimal Sparse Decision Trees Branch and Bound Good Scheduling Order Strong Bounds Accurate Incremental Computation
Bounding the Search Space Lower Bound on Node Support Prior offenses > 3 Theorem : For an optimal tree, no yes the support of each node must Age > 70 Predict Arrest no yes be above 2 C . Prior Offenses > 2
Bounding the Search Space Lower Bound on Node Support Prior offenses > 3 Theorem : For an optimal tree, no yes the support of each node must Age > 70 Predict Arrest no yes be above 2 C . x Prior Offenses > 2 Node support insufficient to produce optimal solution
Bounding the Search Space Lower Bound on Node Support Prior offenses > 3 Theorem : For an optimal tree, no yes x the support of each node must Age > 70 Predict Arrest no yes be above 2 C . x Prior Offenses > 2 Node support insufficient to produce optimal solution
Bounding the Search Space Lower Bound on Classification Accuracy Prior offenses > 3 Theorem : Each leaf of an no yes optimal tree correctly classifies Felony > 5 Predict Arrest no yes at least fraction C of the data Predict Arrest
Bounding the Search Space Lower Bound on Classification Accuracy Prior offenses > 3 Theorem : Each leaf of an no yes optimal tree correctly classifies Felony > 5 Predict Arrest no yes at least fraction C of the data x Predict Arrest Doesn’t classify at least Cn points correctly.
Bounding the Search Space Lower Bound on Classification Accuracy Prior offenses > 3 Theorem : Each leaf of an no yes x optimal tree correctly classifies Felony > 5 Predict Arrest no yes at least fraction C of the data x Predict Arrest Doesn’t classify at least Cn points correctly.
Bounding the Search Space Permutation Bound Theorem : If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one of them can be pruned. Prior offenses > 3 Age > 18 no yes no yes Age > 18 Prior offenses > 3 Age > 18 Prior offenses > 3 no yes no yes no yes no yes Predict Predict Predict Predict Predict Predict Predict Predict No Arrest Arrest No Arrest Arrest Arrest No Arrest Arrest No Arrest
Bounding the Search Space Permutation Bound Theorem : If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one of them can be pruned. Prior offenses > 3 Age > 18 no yes no yes Age > 18 Prior offenses > 3 Age > 18 Prior offenses > 3 no yes no yes no yes no yes Predict Predict Predict Predict Predict Predict Predict Predict No Arrest Arrest No Arrest Arrest Arrest No Arrest Arrest No Arrest
Bounding the Search Space Permutation Bound Theorem : If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one of them can be pruned. Prior offenses > 3 Age > 18 no yes no yes Age > 18 Prior offenses > 3 Age > 18 Prior offenses > 3 no yes no yes no yes no yes Predict Predict Predict Predict Predict Predict Predict Predict No Arrest Arrest No Arrest Arrest Arrest No Arrest Arrest No Arrest
Bounding the Search Space Permutation Bound Theorem : If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one of them can be pruned. Prior offenses > 3 Age > 18 no yes no yes Age > 18 Prior offenses > 3 Age > 18 Prior offenses > 3 no yes no yes no yes no yes Predict Predict Predict Predict Predict Predict Predict Predict No Arrest Arrest No Arrest Arrest Arrest No Arrest Arrest No Arrest
Bounding the Search Space Permutation Bound Theorem : If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one of them can be pruned. Prior offenses > 3 Age > 18 no yes no yes Age > 18 Prior offenses > 3 Age > 18 Prior offenses > 3 no yes no yes no yes no yes Predict Predict Predict Predict Predict Predict Predict Predict No Arrest Arrest No Arrest Arrest Arrest No Arrest Arrest No Arrest
Bounding the Search Space • Other bounds enable even more pruning – Equivalent points bound: Samples with the same features, but different predictions will produce misclassifications regardless of model. – Bound on the number of leaves: Regularization value bounds the number of leaves.
Optimal Sparse Decision Trees Accurate Open Source FAST Interpretable https://github.com/xiyanghu/OSDT
Recommend
More recommend