Explainable Machine Learning Models for Structured Data Dr - PowerPoint PPT Presentation

Explainable Machine Learning Models for Structured Data Dr Georgiana Ifrim georgiana.ifrim@insight-centre.org (joint work with Severin Gsponer, Thach Le Nguyen, Iulia Ilie) 30 July 2018

Overview Structured Data • Symbolic Sequences (e.g., DNA, malware) • Numeric Sequences (e.g., time series) • • Explainable Learning Models Black-Box vs Linear Models with Rich Features • • SEQL: Sequence Learning with All-Subsequences Framework for Sequence Classification & Regression • Insight Centre for Data Analytics Slide 2

Structured Data: Sequences & Time Series Many Applications: Value Data points 290.507 AGGGCATCATGGAGCTGTCCAG • DNA 679.305 ATCACAATTTTGCCGAGAGCGA 1998.715 GTACACCCCGTTCGGCGGCCCA 447.803 CCTTTAGCCCATCGTTGGCCAA Byte sequence Class Data points +1 C7 01 24 04 5F 0E EA DC 00 E9 D6 4A 00 0C 66 89 • Malware +1 74 13 BA EF 01 00 06 68 95 14 88 B7 00 0F 0E EA -1 08 F9 C8 1A 80 C1 8B 48 40 00 89 51 10 B8 04 00 -1 B8 00 00 00 00 50 E8 D8 00 00 00 83 C4 04 53 FF Assembly code • Sensors Insight Centre for Data Analytics Slide 3

Explainable Machine Learning Models • Accuracy & Efficiency: • Many accurate algorithms: e.g., ensembles (Random Forest), Deep Neural Networks; but hard to interpret big, complex models • Large volumes of data, need efficient models • Interpretability: • White box (linear models) vs black box (deep nets) • Interpretable AI is a big deal: Darpa Explainable AI (XAI; 2016), EU GDPR legislation (May 2018) Insight Centre for Data Analytics Slide 4

Darpa Explainable AI (XAI) [ Source: http://www.darpa.mil/program/explainable-artificial-intelligence ] Insight Centre for Data Analytics Slide 5

SEQL: Sequence Learning with All-Subsequences Key Idea: Linear Models with Rich Features are Accurate and Interpretable Linear models are interpretable and well understood • (linear regression, logistic regression). Linear models with rich features are accurate (similar • accuracy to ensembles, kernel-SVM, deep nets). Efficiently optimize linear models: We exploit the • structure of a massive feature space (all-subsequences) to quickly select good features. Insight Centre for Data Analytics Slide 6

SEQL: Linear Models for Symbolic Sequences SEQL: all-subsequences are candidate features; Solution Approach focus on selecting good features quickly Score Sequence 290.5 AGTC CACAA GGCTAGGATAGCTA TCCG GATCGA 315.1 TATCCTGCAGTACAAG TCCG TAATT CACAA TCCA 805.6 AGTCCGC TAGGCT AGGATAGCTAGCCCGATCGA 799.7 AGCCAAGACCTGAAA TAGGCT CCTGAGATACAG ??? CGGGTCGTA TCCG CACTGAATATC TAGGCT TACG SEQL Model: Weight k -mer 796.6 TAGGCT Goal is to learn a mapping: 402,5 CACAA f : S → R -125.3 TCCG Linear model (weighted sum of features) : f(x) = β t x, with β the feature weights and x the feature vector Insight Centre for Data Analytics Slide 7

SEQL: Linear Models for Symbolic Sequences Add features iteratively with greedy coordinate descent + branch- and-bound (bound the search for the best feature) Algorithm 1 Coordinate Descent with Gauss Southwell Selection Key Ideas Bound gradient of k-mer using only information about its 1: Set β ( 0 ) = 0 sub-k-mers. 2: while termination condition not met do Example Calculate objective function L ( β ( t ) ) 3: Given: s p = ” ACT ” Find coordinate j t with maximum gradient value 4: Calculate bound: µ ( s p ) Find optimal step size η j t 5: s 1 = ” ACTC ” -> gradient ( s 1 ) ≤ µ ( s p ) Update β ( t ) = β ( t − 1 ) − η j t ∂β jt ( β ( t − 1 ) ) e j t ∂ L s 2 = ” AACT ” -> gradient ( s 2 ) ≤ µ ( s p ) 6: s 3 = ” TACTG ” -> gradient ( s 3 ) ≤ µ ( s p ) Add corresponding feature to feature set 7: 8: end while How do we find coordinate j t e ffi ciently? Insight Centre for Data Analytics Slide 8

SEQL for Time Series Classification Time Series à Discretisation (SAX, SFA) à Symbolic Sequence à Sequence Learner (SEQL) Insight Centre for Data Analytics Slide 9

SEQL for Time Series Classification a 1 b 1 b 1 a 1 c 1 c 1 F 1 SEQL a 1 b 1 b 1 c 1 a 1 c 1 c 1 d 1 SAX/SFA SEQL F 2 Classifier M a 2 b 2 b 2 b 2 0.1 0.3 0 1 0 0 0.2 0.4 a 2 c 2 c 2 d 2 1 0 0 1 ... ... ... F n SEQL a n b n b n c n a n c n d n d n Insight Centre for Data Analytics Slide 10

Evaluation on Time Series Classification Ranking of learning algorithms by Accuracy UCR Archive (85 TSC datasets: sensors, images, ECG) Top-3 models: 1. mtSS-SEQL+LR (our method, a linear model) 2. FCN (deep neural network) 3. COTE (ensemble of 35 classifiers) CD 6 7 8 9 10 mtSS − SEQL+LR FCN COTE WEASEL ResNet mtSFA − SEQL+LR mtSS − SEQL ST mtSAX − SEQL+LR BOSS Insight Centre for Data Analytics Slide 11

Interpretability • GunPoint dataset tracking hand movement w/o Gun Gun time series annotation Steady pointing Hand moving to shoulder level Hand moving down to grasp gun Hand moving above holster Hand at rest 0 10 20 30 40 50 60 70 80 90 Point time series annotation Steady pointing Hand moving to shoulder level Hand at rest 0 10 20 30 40 50 60 70 80 90 Insight Centre for Data Analytics Slide 12

Coe ffi cients Subsequences 0 . 065 84 cbaab Interpretability 0 . 062 47 db 0 . 062 23 ddddb 0 . 062 00 da 0 . 059 72 bbbbbbbbbbcdddd − 0 . 053 72 aaaaaabbbb − 0 . 054 39 bbbbaaaaaa − 0 . 054 58 bbbcddddd Point (top) and Gun (bottom) Salient Region for Classification Decision Github code for our work: https://github.com/heerme?tab=repositories Insight Centre for Data Analytics Slide 13

Recap SEQL • Family of machine learning algorithms to train/predict (with) linear models for sequences Coordinate descent with Gauss-Southwell feature selection + • Branch-and-bound for efficient feature search Sequence Classificati on (KDD08, KDD11): Logistic loss, l2-SVM loss • Sequence Regression (ECMLPKDD17): Least-squares loss • Time Series Classification (ICDE17): SEQL + SAX discretization • Future Work: • Multi-dimensional Sequences • Insight Centre for Data Analytics Slide 14

References [DMKD18, Under review] T Le Nguyen, S Gsponer, I Ilie, G Ifrim, Interpretable Time Series Classification using • All-Subsequence Learning and Symbolic Representations in Time and Frequency Domains, DMKD18, 2018. [In prep] S Gsponer, B Smyth , G Ifrim , Symbolic Sequence Classification with Gradient Boosted Linear Models, • 2018 [ECMLPKDD17] S Gsponer,, B Smyth, G Ifrim. Efficient Sequence Regression by Learning Linear Models in All- • Subsequence Space , ECML-PKDD, 2017. [ICDE17] T Le Nguyen, S Gsponer, G Ifrim, Time Series Classification by Sequence Learning in All-Subsequence • Space, ICDE, 2017. [PlosOne14] BP Pedersen, G Ifrim, P Liboriussen, KB Axelsen, MG Palmgren, P Nissen, C. Wiuf, C. Pedersen, Large • scale identification and categorization of protein sequences using structured logistic regression , PloS one 9 (1), 2014. [KDD11] G Ifrim, C Wiuf, Bounded coordinate-descent for biological sequence classification in high dimensional • predictor space , KDD, 2011. [KDD08] G. Ifrim, G. Bakir, and G. Weikum, Fast logistic regression for text categorization with variable-length n- • grams , KDD, 2008. Insight Centre for Data Analytics 15

Explainable Machine Learning Models for Structured Data Dr - PowerPoint PPT Presentation

Explainable Machine Learning Models for Structured Data Dr Georgiana Ifrim georgiana.ifrim@insight-centre.org (joint work with Severin Gsponer, Thach Le Nguyen, Iulia Ilie) 30 July 2018 Overview Structured Data Symbolic Sequences (e.g.,

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Explainable (Deep) Learning and Simulation approaches Torsten Mller Visualization and

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Automated Reasoning for EXplainable Artificial Intelligence Maria Paola Bonacina Dipartimento di

The Role of Normware in Trustworthy and Explainable AI Giovanni Sileno (g.sileno@uva.nl),

Visualization for Explainable Classifiers Yao MING THE HONG KONG UNIVERSITY OF SCIENCE AND

Kowledge-Based Programs as Explainable Policies for Contingent Planning J. Lang, A. Saffidine,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Deploying, At An Unusual Scale Andrew Godwin @andrewgodwin

Is It Latency or Do You Just Suck? Latency Can Kill: Precision and Deadline in Online Games

Agilistas are killing the planet is Agile, Scrum, kanban or

A Simple Method for Preparing Reference Slides of Seed Article in Journal of Range Management

Can a Model Checker Generate Tests for Non-Deterministic Systems? Sergiy Boroday, Alexandre

CREATING A CUSTOM TIME MAGAZINE COVER USING PIXLR Michael Babler EDTECH 521 Fall 2012 CREATE A

Short-Term Solar Irradiance Forecasting Using Calibrated Probabilistic Models Eric Zelikman*,

Hamiltonian design in atom-light interactions with rubidium ensembles M. Dabrowski, R.