Type-Driven Automated Learning with L ALE Martin Hirzel, Kiran Kate, Avi Shinnar, Pari Ram, and Guillaume Baudart Tuesday 4 November 2019 https://github.com/ibm/lale
Value Proposition Augment, but don’t replace, the Automation data scientist. Easy search and tuning of pipelines Interop Usability Python building Like scikit-learn blocks & beyond plus types 2
Categorical + Continuous Dataset https://nbviewer.jupyter.org/github/IBM/lale/blob/master/examples/talk_2019-1105-lale.ipynb 3
Manual Pipeline 4
Pipeline Combinators L ALE features Name Description Scikit-learn features >> or pipe feed to next make_pipeline make_pipeline & or make_union or and run both ColumnTransformer make_union | or N/A (specific to given or choose one make_choice Auto-ML tool) 5
Automated Pipeline 6
Displaying Automation Results 7
Bindings as Lifecycle: Venn Diagram Individual operator Pipeline Meta-model schemas, priors steps, grammar arrange Planned graph topology init Trainable hyperparameters operator choices fit Trained learned coefficients compose ( >> , & , | ) “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 8
Semi-Automated Data Science Manual control over automation Examples • Interpretable Restrict available operator choices • Based on licenses • Based on GPU requirements Tweak graph topology • Custom preprocessing • Multi-modal data • Fairness mitigation • Adjust range for continuous Tweak hyperparameter schemas • Restrict choices for categorical Expand available operator choices • Wrap existing library • Write your own operators pipeline = ( arrange, init, freeze search, ( Project(columns={'type': 'number'}) >> Norm fit, & Project(columns={'type': 'string'}) >> OneHot) Data >> Concat score pretty-print, visualize Scientist >> (LR | XGBoost | LinearSVC)) 9
Constraints in Scikit-learn 10
Type-Driven Manual Learning in L ALE Schemas Data validate Scientist Hyperparameters Trainable Pipeline Project Norm Concat XGBoost Project OneHot 11
Constraints in L ALE 12
Types as Documentation 13
Constraints in Auto-ML Problem: Some automated trials raise exceptions Solution 1: Unconstrained search space • { solver : [ linear , sag , lbfgs ], penalty : [ l1 , l2 ]} • Catch exception (after some time) • Return made-up loss np.float.max Solution 2: Constrained search space • { solver : [ linear , sag , lbfgs ], penalty : [ l1 , l2 ]} and ( if solver : [ sag , lbfgs ] then penalty : [ l2 ]) • No exceptions (no time wasted) • No made-up loss 14
Types as Search Spaces Planned Pipeline Search Space generate L ALE can generate search Project Norm spaces for various Auto-ML Concat LR | XGBoost | LinearSVC tools including hyperopt, Project OneHot GridSearchCV, and SMAC Schemas Data acquire Scientist Hyperparameters Trainable Pipeline Search Point decode Sample from search space, Project Norm encoded by given Auto-ML tool Concat XGBoost Project OneHot “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 15
Types as Single Source of Truth Planned Pipeline Search Space generate L ALE can generate search Project Norm spaces for various Auto-ML Concat LR | XGBoost | LinearSVC tools including hyperopt, Project OneHot GridSearchCV, and SMAC Schemas Data validate acquire Scientist Hyperparameters Trainable Pipeline Search Point decode Sample from search space, Project Norm encoded by given Auto-ML tool Concat XGBoost Project OneHot “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 16
Customizing Types 17
Scikit-learn Compatible Interopability Pipeline ( bold : best found choice) Modality Dataset Movie reviews (BERT | TFIDF ) Text (sentiment >> (LR | MLP | KNN | SVC | PAC ) analysis) Car (structured J48 | ArulesCBA | LR | KNN Table with categorical features) CIFAR-10 Images (image ResNet50 classification) Epilepsy WindowTransformer Time-series (seizure >> (KNN | XGBoost | LR ) classification) >> Voting 18
Ongoing Work • General improvements • More operators • More Auto-ML tools • More robustness • Resource usage • Memory • Compute • Expressiveness • Grammars • Ensembles We welcome your suggestions and contributions! 19
Conclusion Automation Easy search and tuning of pipelines github.com/ibm/lale Interop Usability Python building Like scikit-learn blocks & beyond plus types Scikit-learn compatible interop 20
Recommend
More recommend