Machine Learning 101 QCon SF 2019 Grishma Jena Data Scientist, IBM - PowerPoint PPT Presentation

Machine Learning 101 QCon SF 2019 Grishma Jena Data Scientist, IBM @DebateLover

About me Cross-portfolio Data Scientist with IBM Data and ● AI in San Francisco ● Infusing data science in UX and Design gjena.github.io Background in Machine Learning and Natural ● Language Processing grishmajena ● Love to encourage women and youngsters in tech ● Speaker and mentor DebateLover Started with teaching Python at San Francisco ○ Public Library ○ Mentor for non-profit AI4ALL for teenagers ○ Spoken at PyCon, OSCON and other conferences

How much data is produced every year? 16.3 Zettabytes* *1 Zettabyte = 1 trillion Gigabytes Grishma Jena @DebateLover

How much data does the brain hold? 2.5 Petabytes* *2.5 petabytes = three million hours of TV shows i.e. the video recorder in the TV would be playing continuously for 300 years *1 Petabyte = 1 million Gigabytes Grishma Jena @DebateLover

We generate more data than we realize... 2.5 5 million laptops 90 years HD video Exabytes per day 530,000,000 million songs 150,000,000 iphones

44 zettabytes Digital Universe represented by the memory in a stack of iPad Air tablets IPad Air Source: EMC 128 GB memory 0.29’’ thick

Buzzwords ● Data - any piece of information that can be stored and processed It’s a dog! ● Data science - Set of methods, processes, heuristics, and algorithms to extract insights from data Big data - extremely large amounts of data which ● traditional data processing systems fail to handle ● Artificial Intelligence - study of intelligent agents or developing intelligent systems Machine Learning - allow computer systems to ● learn from the data without explicitly programming

Question Tell story Data Validate Model Explore Clean Wrangle Pre process Actionable insight Data pipeline

What question to answer? Formulate a question the stakeholder is trying to answer How do we identify and classify Is this a fraudulent credit card Who are the next 1000 customers spam emails? transaction? we will lose and why? How likely is it the user will buy How can we predict housing our product? prices for the next few years?

Data sources Data comes from variety of sources in different formats and is often messy.

Data wrangling Data wrangling - gathering, selecting, transforming data for easy access and analysis

Data exploration

Model building Feature engineering - select important ● features and construct more meaningful ones, using domain knowledge ● Divide the data into training and test sets Create Machine Learning model ● ○ Choose supervised or unsupervised learning ○ Tune model parameters ○ Train the model ○ Monitor against overfitting ○ Evaluate model on unseen data i.e. test set ● Iterative process with different features Can have ensemble of models ●

Machine learning approaches Supervised Unsupervised Reinforcement learning learning learning

Tool: Jupyter notebook Jupiter? Jupyter

Algorithms : Classification

Algorithms: Regression

Algorithms: Clustering

Algorithms: Anomaly detection

Reinforcement learning

Model validation ● Measure model quality - how good is it? Use cross-validation for robustness ● ● Use metrics like accuracy, precision, recall, F1 score, confusion matrix ● H 0 is the null hypothesis i.e. any observed difference in samples is due to chance or sampling error False positive False negative

Data visualization and storytelling ● Tell a story with data Communicate findings to key ● stakeholders ● Use plots and interactive visualizations Answer the original questions ● ● Use powerful narratives for storytelling

Ethics in Data Science All involved in handling data should have an ethical discussion about the way the data is used. Checklist by Mike Loukides, Hilary Mason, DJ Patil: ● How can the tech be attacked or misused ● Fair and representative training data Study and understand possible sources of bias ● Diverse team - opinions, backgrounds, thoughts ● ● Clear, explicit user consent and data protection ● Ensure fairness over time, and for different groups Shut down in production if behaving badly and ● redress those harmed

Recap ● What is Machine Learning? ● Machine Learning approaches ● Data pipeline Supervised (Classification, ○ ○ Question Regression) Data sources ○ ○ Unsupervised (Clustering) ○ Data cleaning ○ Reinforcement learning ○ Data exploration Ethics ● ○ Model building Model validation ○ ○ Data visualization and storytelling

Resources ● IBM’s Cognitive class ● Jupyter ● KD Nuggets ● Kaggle ● Towards Data Science ● Coursera ● Free Code Camp ● School of AI ● Seattle Data Guy’s Python resources ● Fast.ai ● Google ML crash course ● FiveThirtyEight

gjena.github.io grishmajena DebateLover Contact

Machine Learning 101 QCon SF 2019 Grishma Jena Data Scientist, IBM - PowerPoint PPT Presentation

Machine Learning 101 QCon SF 2019 Grishma Jena Data Scientist, IBM @DebateLover About me Cross-portfolio Data Scientist with IBM Data and AI in San Francisco Infusing data science in UX and Design gjena.github.io Background in

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Common Alerting Protocol (CAP) Presentation Outline 101.1 Opportunity and Challenge 101.2

Networking 101.101.101.101 The Internet The Internet is governed by a series of protocols

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31,

EVALUATION OF STUDENT PERFORMANCE WITH DATA MINING: AN APPLICATION OF ID3 AND CART ALGORITHMS

Coordination Request Capture exercise, Validation, and Correction 1 SpaceCap: First steps

The AI Thunderdome Using OpenStack to accelerate AI training with Sahara, Spark, and Swift Sean

RESPONSIBLE BUSINESS RESPONSIBLE BUSINESS ROTORUA AQUATIC CENTRE ROTORUA AQUATIC CENTRE

NOBINA AB Investor presentation, September - November 2017 1 LARGEST PUBLIC TRANSPORT COMPANY

The multi rotor turbine Project Manager, Sren O. Lind 17-11-2016 This material is not for

January 2018 Disclaimer This management presentation is intended to provide an overview of the

Sambuz

Useful Links

Newsletter

Mail Us