1 https://trallard.github.io/Talks/RSE-shefeld The state of machine - PowerPoint PPT Presentation

https://trallard.github.io/Talks/RSE-shef�eld The state of machine learning The state of machine learning RSE seminar, University of Shef�eld Tania Allard, PhD 2 . 1

Tania Allard Tania Allard Developer advocate Research Software Engineer Data expert  trallard  ixek 2 . 2

 ixek Machine learning Machine learning everywhere everywhere 3

 ixek Machine learning Machine learning everywhere everywhere So much that it is starting to not make sense anymore... like when you say a word 50 times in a row 3

 ixek For good or for bad it is everywhere: 4

 ixek For good or for bad it is everywhere:  Deployed in healthcare and warfare 4

 ixek For good or for bad it is everywhere: Deployed in healthcare and warfare  In the creative industry (from music to books)  4

 ixek For good or for bad it is everywhere: Deployed in healthcare and warfare  In the creative industry (from music to books)  Reading CVs and judging your creditworthiness  4

 ixek For good or for bad it is everywhere: Deployed in healthcare and warfare  In the creative industry (from music to books)  Reading CVs and judging your creditworthiness  Making us more Instagram worthy  4

 ixek The big players:  Apple  Facebook  Google IBM Intel  Microsoft Nvidia Open AI  Twitter 5

 ixek Machine learning generalised in two workflows Machine learning generalised in two workflows Model development (R&D) Model serving (production for customers consumption) 6

 ixek 7

 ixek What are these giants' issues? What are these giants' issues? 8

 ixek What are these giants' issues? What are these giants' issues? Mainly scale...in multiple areas 8

 ixek If we have a small team we have a smaller number of issues... right? 9

 ixek If we have a small team we have a smaller number of issues... right?  Small number of models to maintain 9

 ixek If we have a small team we have a smaller number of issues... right? Small number of models to maintain  People have the knowledge in their heads  9

 ixek If we have a small team we have a smaller number of issues... right? Small number of models to maintain  People have the knowledge in their heads  They have their own methods to track progress  9

 ixek That is the small team performance fallacy That is the small team performance fallacy We still need processes and best practices in place... so let me get back at this later 10

 ixek As the team As the team demand demand grows the problems grow grows the problems grow Increased complexity of data �ow  Larger number of work�ows  Managing complexity of �ows and scheduling becomes a nightmare  Resource allocation has to be on point  11

 ixek Serving models becomes harder Serving models becomes harder 12

 ixek

 ixek How do they serve How do they serve millions of millions of

customers across customers across the globe? the globe? 14

 ixek Three main players: Infrastructure / resources  Processes  People  15

 ixek

 ixek 17

 ixek Infrastructure as a code Infrastructure as a code 18

 ixek 19

 ixek Everything as a code Everything as a code Version control Less ambiguity on the con�gurations Shorter turnarounds Deterministic environments 20

 ixek Processes Processes 21

 ixek

 ixek Data and code as first class citizens Data and code as first class citizens

 ixek

 ixek People People Data scientist Data engineer ML Engineer 25

 ixek What does academia have to What does academia have to offer? offer?  Much more than you think 26

 ixek People People Researchers Research software engineers Librarians 27

 ixek Resources and Infrastructure Resources and Infrastructure We still need to �gure this out... it is pretty much an ad-hoc case 28

 ixek Processes Processes Scienti�c rigour Peer review Data management 29

 ixek Which areas could benefit from academic Which areas could benefit from academic collaborations? collaborations? 30

 ixek Meta-learning Meta-learning Humans learn across tasks (learn from experience)

 ixek If prior tasks are similar then we can carry prior knowledge 32

 ixek AlphaGo uses some sort of meta-learning 33

 ixek Algorithmic fairness Algorithmic fairness It has become increasingly important to ensure that models are making justi�ed calls that are free from unintended bias. 34

 ixek Algorithmic fairness Algorithmic fairness It has become increasingly important to ensure that models are making justi�ed calls that are free from unintended bias. The one way to make progress is through interdisciplinary collaboration 34

 ixek Towards model explainability Towards model explainability Address the trade-off between performance and interpretability 35

 ixek Reinforcement learning deadly triad Reinforcement learning deadly triad Following nature's paradigms RL agents receive awards and then learn to maximise success by performing optimal actions. 36

 ixek How to keep an algorithm learning if there are far too many potential variables or outcomes to be evaluated without being fed ridiculous amounts of data. 37

 ixek In brief In brief Focus on the 3 pillars: People  Infrastructure  Processes  38

Thank you Thank you  ixek  tania.allard@microsoft.com 39

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine - PowerPoint PPT Presentation

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine learning The state of machine learning RSE seminar, University of Shefeld Tania Allard, PhD 2 . 1 Tania Allard Tania Allard Developer advocate Research Software

Tapping Capital Markets to Finance WASH Investments CREDIT: RICCARDO NIELS MAYER Stockholm

A Theory of Credit Scoring and Competitive Pricing of Default Risk Satyajit Chatterjee Dean

Teaching action : Labelling the Pasteur International Courses offering ? Thierry Lang Carlos

PHYS 110 RSO announcements Nuts & Bolts of being an Illinois Physics student

Modeling and Estimation of Introduction Dependent Credit Rating Transitions The Model

Explaining AI: Putting Theory into Practice Luke Merrick Data Scientist fiddler.ai Abstract In

BORROWING BASICS not only individual well-being, but also the

Title page Corporate Finance Liaison November 2012 Commission Commissioner John Price

The curious DM events of J1713+0747 Fang Xi Lin w/ Michael Lam, Hsiu-Hsien Lin, Jing Luo, Ue-Li

OAK CREST MIDDLE SCHOOL LANDSCAPING AND BALOUR ST. IMPROVEMENTS START DATE: 6/15/2015 -

An Overview of Big Data Research Programs in Japan Etsuya Shibayama The University of Tokyo

Applying Temporal Blocking with a Directivebased Approach Shota Kuroda, Toshio Endo, Satoshi

CReST NHS England and NHS Improvement CReST is a demand and capacity tool, developed for

Waves, Light & Information Classwork and Homework www.njctl.org Slide 3 / 59 Classwork #1:

Robustness and independence of voice timbre features under live performance acoustic degradations

Software Design, Modelling and Analysis in UML Lecture 09: Class Diagrams IV 2012-11-27 09

CS 5150 Software Engineering 5. Project Management William Y. Arms Project Management: OS 360

Project Management Massimo Felici Massimo Felici Project Management 2011 c 1 Project

Applications of Graph Traversal Algorithm : Design & Analysis [12] In the last class

Advanced features in Score-P and Scalasca David Bhme,

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools

Parallel Scan Alg lgorithm Shang Wang 1,2 , Yifan Bai 1 , Gennady Pekhimenko 1,2 1 2 The

Sparklens: Understanding the Scalability Limits of Spark Applications Ashish Dubey, Qubole ABOUT

Use of Task Graph Model for Parallel Program Design Detailed steps for parallel program design

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine - PowerPoint PPT Presentation

1 https://trallard.github.io/Talks/RSE-shefeld The state of machine learning The state of machine learning RSE seminar, University of Shefeld Tania Allard, PhD 2 . 1 Tania Allard Tania Allard Developer advocate Research Software

Tapping Capital Markets to Finance WASH Investments CREDIT: RICCARDO NIELS MAYER Stockholm

A Theory of Credit Scoring and Competitive Pricing of Default Risk Satyajit Chatterjee Dean

Teaching action : Labelling the Pasteur International Courses offering ? Thierry Lang Carlos

PHYS 110 RSO announcements Nuts &amp; Bolts of being an Illinois Physics student

Modeling and Estimation of Introduction Dependent Credit Rating Transitions The Model

Explaining AI: Putting Theory into Practice Luke Merrick Data Scientist fiddler.ai Abstract In

BORROWING BASICS not only individual well-being, but also the

Title page Corporate Finance Liaison November 2012 Commission Commissioner John Price

The curious DM events of J1713+0747 Fang Xi Lin w/ Michael Lam, Hsiu-Hsien Lin, Jing Luo, Ue-Li

OAK CREST MIDDLE SCHOOL LANDSCAPING AND BALOUR ST. IMPROVEMENTS START DATE: 6/15/2015 -

An Overview of Big Data Research Programs in Japan Etsuya Shibayama The University of Tokyo

Applying Temporal Blocking with a Directivebased Approach Shota Kuroda, Toshio Endo, Satoshi

CReST NHS England and NHS Improvement CReST is a demand and capacity tool, developed for

Waves, Light &amp; Information Classwork and Homework www.njctl.org Slide 3 / 59 Classwork #1:

Robustness and independence of voice timbre features under live performance acoustic degradations

Software Design, Modelling and Analysis in UML Lecture 09: Class Diagrams IV 2012-11-27 09

CS 5150 Software Engineering 5. Project Management William Y. Arms Project Management: OS 360

Project Management Massimo Felici Massimo Felici Project Management 2011 c 1 Project

Applications of Graph Traversal Algorithm : Design &amp; Analysis [12] In the last class

Advanced features in Score-P and Scalasca David Bhme,

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools

Parallel Scan Alg lgorithm Shang Wang 1,2 , Yifan Bai 1 , Gennady Pekhimenko 1,2 1 2 The

Sparklens: Understanding the Scalability Limits of Spark Applications Ashish Dubey, Qubole ABOUT

Use of Task Graph Model for Parallel Program Design Detailed steps for parallel program design

PHYS 110 RSO announcements Nuts & Bolts of being an Illinois Physics student

Waves, Light & Information Classwork and Homework www.njctl.org Slide 3 / 59 Classwork #1:

Applications of Graph Traversal Algorithm : Design & Analysis [12] In the last class