type driven automated learning
play

Type-Driven Automated Learning with L ALE Martin Hirzel, Kiran - PowerPoint PPT Presentation

Type-Driven Automated Learning with L ALE Martin Hirzel, Kiran Kate, Avi Shinnar, Pari Ram, and Guillaume Baudart Tuesday 4 November 2019 https://github.com/ibm/lale Value Proposition Augment, but dont replace, the Automation data


  1. Type-Driven Automated Learning with L ALE Martin Hirzel, Kiran Kate, Avi Shinnar, Pari Ram, and Guillaume Baudart Tuesday 4 November 2019 https://github.com/ibm/lale

  2. Value Proposition Augment, but don’t replace, the Automation data scientist. Easy search and tuning of pipelines Interop Usability Python building Like scikit-learn blocks & beyond plus types 2

  3. Categorical + Continuous Dataset https://nbviewer.jupyter.org/github/IBM/lale/blob/master/examples/talk_2019-1105-lale.ipynb 3

  4. Manual Pipeline 4

  5. Pipeline Combinators L ALE features Name Description Scikit-learn features >> or pipe feed to next make_pipeline make_pipeline & or make_union or and run both ColumnTransformer make_union | or N/A (specific to given or choose one make_choice Auto-ML tool) 5

  6. Automated Pipeline 6

  7. Displaying Automation Results 7

  8. Bindings as Lifecycle: Venn Diagram Individual operator Pipeline Meta-model schemas, priors steps, grammar arrange Planned graph topology init Trainable hyperparameters operator choices fit Trained learned coefficients compose ( >> , & , | ) “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 8

  9. Semi-Automated Data Science Manual control over automation Examples • Interpretable Restrict available operator choices • Based on licenses • Based on GPU requirements Tweak graph topology • Custom preprocessing • Multi-modal data • Fairness mitigation • Adjust range for continuous Tweak hyperparameter schemas • Restrict choices for categorical Expand available operator choices • Wrap existing library • Write your own operators pipeline = ( arrange, init, freeze search, ( Project(columns={'type': 'number'}) >> Norm fit, & Project(columns={'type': 'string'}) >> OneHot) Data >> Concat score pretty-print, visualize Scientist >> (LR | XGBoost | LinearSVC)) 9

  10. Constraints in Scikit-learn 10

  11. Type-Driven Manual Learning in L ALE Schemas Data validate Scientist Hyperparameters Trainable Pipeline Project Norm Concat XGBoost Project OneHot 11

  12. Constraints in L ALE 12

  13. Types as Documentation 13

  14. Constraints in Auto-ML Problem: Some automated trials raise exceptions Solution 1: Unconstrained search space • { solver : [ linear , sag , lbfgs ], penalty : [ l1 , l2 ]} • Catch exception (after some time) • Return made-up loss np.float.max Solution 2: Constrained search space • { solver : [ linear , sag , lbfgs ], penalty : [ l1 , l2 ]} and ( if solver : [ sag , lbfgs ] then penalty : [ l2 ]) • No exceptions (no time wasted) • No made-up loss 14

  15. Types as Search Spaces Planned Pipeline Search Space generate L ALE can generate search Project Norm spaces for various Auto-ML Concat LR | XGBoost | LinearSVC tools including hyperopt, Project OneHot GridSearchCV, and SMAC Schemas Data acquire Scientist Hyperparameters Trainable Pipeline Search Point decode Sample from search space, Project Norm encoded by given Auto-ML tool Concat XGBoost Project OneHot “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 15

  16. Types as Single Source of Truth Planned Pipeline Search Space generate L ALE can generate search Project Norm spaces for various Auto-ML Concat LR | XGBoost | LinearSVC tools including hyperopt, Project OneHot GridSearchCV, and SMAC Schemas Data validate acquire Scientist Hyperparameters Trainable Pipeline Search Point decode Sample from search space, Project Norm encoded by given Auto-ML tool Concat XGBoost Project OneHot “Type-Driven Automated Learning with Lale”, https://arxiv.org/pdf/1906.03957.pdf 16

  17. Customizing Types 17

  18. Scikit-learn Compatible Interopability Pipeline ( bold : best found choice) Modality Dataset Movie reviews (BERT | TFIDF ) Text (sentiment >> (LR | MLP | KNN | SVC | PAC ) analysis) Car (structured J48 | ArulesCBA | LR | KNN Table with categorical features) CIFAR-10 Images (image ResNet50 classification) Epilepsy WindowTransformer Time-series (seizure >> (KNN | XGBoost | LR ) classification) >> Voting 18

  19. Ongoing Work • General improvements • More operators • More Auto-ML tools • More robustness • Resource usage • Memory • Compute • Expressiveness • Grammars • Ensembles We welcome your suggestions and contributions! 19

  20. Conclusion Automation Easy search and tuning of pipelines github.com/ibm/lale Interop Usability Python building Like scikit-learn blocks & beyond plus types Scikit-learn compatible interop 20

Recommend


More recommend