Learning Effective and Interpretable Semantic Models using - PowerPoint PPT Presentation

1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University

2 Distributional Semantic Modeling

2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space

2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ...

2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ... • Although effective, these models are often not interpretable

2 Distributional Semantic Modeling • Words are represented in a high dimensional vector space • Long history: • (Deerwester et al., 1990), (Lin, 1998), (Turney, 2006), ... • Although effective, these models are often not interpretable Examples of top 5 words from 5 randomly chosen dimensions from SVD 300

Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008)

Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) “motorbike” Input stimulus word

Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) “motorbike” . . . (0.00, rub) Input (0.00, taste) stimulus word Semantic representation

Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) “motorbike” . . . (0.00, rub) Input (0.00, taste) stimulus word Semantic Mapping learned representation from fMRI data

Why interpretable dimensions? 3 Semantic Decoding: (Mitchell et al., Science 2008) (0.87, ride) (0.29, see) predicted “motorbike” . activity . for . “motorbike” (0.00, rub) Input (0.00, taste) stimulus word Semantic Mapping learned representation from fMRI data

4 Why interpretable dimensions? (Mitchell et al., Science 2008)

4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns!

4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs

4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs ‣ can’t represent arbitrary concepts

4 Why interpretable dimensions? (Mitchell et al., Science 2008) • Interpretable dimension reveals insightful brain activation patterns! • But, features in the semantic representation were based on 25 hand-selected verbs ‣ can’t represent arbitrary concepts ‣ need data-driven, broad coverage semantic representations

5 What is an Interpretable, Cognitively-plausible Representation?

5 What is an Interpretable, Cognitively-plausible Representation? features words

5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ...

5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ... 1. Compact representation: Sparse , many zeros

5 What is an Interpretable, Cognitively-plausible Representation? features words w ... 0 0 0 ... 1.2 ... 1. Compact representation: 2. Uneconomical to store negative Sparse , many zeros (or inferable) characteristics: Non-Negative

5 What is an Interpretable, Cognitively-plausible Representation? features words 3. Meaningful Dimensions: w ... 0 0 0 ... 1.2 ... Coherent 1. Compact representation: 2. Uneconomical to store negative Sparse , many zeros (or inferable) characteristics: Non-Negative

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing)

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Hand Coded 83.5 Corpus-derived Corpus-derived 83.1 (existing) Prediction accuracy (on Neurosemantic Decoding) (Murphy, Talukdar, Mitchell, StarSem 2012)

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing)

6 Properties of Different Semantic Representations Broad Non- Effective Sparse Coherent Coverage Negative Hand Tuned Corpus-derived (existing) NNSE Our proposal

7 Non-Negative Sparse Embedding (NNSE)

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X w i input representation for word w i

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) X A D = x w i input representation for word w i

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative • sparsity penalty on the rows of A

7 Non-Negative Sparse Embedding (NNSE) Input Matrix (Corpus Cooc + SVD) Basis X A D = x w i input representation NNSE representation for word w i for word w i • matrix A is non-negative • sparsity penalty on the rows of A • alternating minimization between A and D, using SPAMS package

8 NNSE Optimization X A D = x

9 Experiments

9 Experiments • Three main questions

9 Experiments • Three main questions 1. Are NNSE representations effective in practice?

9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE?

9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent?

9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup

9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup • partial ClueWeb09, 16bn tokens, 540m sentences, 50m documents

9 Experiments • Three main questions 1. Are NNSE representations effective in practice? 2. What is the degree of sparsity of NNSE? 3. Are NNSE dimensions coherent? • Setup • partial ClueWeb09, 16bn tokens, 540m sentences, 50m documents • dependency parsed using Malt parser

10 Baseline Representation: SVD

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline • This is also the input (X) to NNSE

10 Baseline Representation: SVD • For about 35k words (~adult vocabulary) , extract • document co-occurrence • dependency features from the parsed corpus • Reduce dimensionality using SVD. Subsets of this reduced dimensional space is the baseline • This is also the input (X) to NNSE • Other representations were also compared (e.g., LDA, Collobert and Weston, etc.), details in the paper

11 Is NNSE effective in Neurosemantic Decoding?

Learning Effective and Interpretable Semantic Models using - PowerPoint PPT Presentation

1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University 2 Distributional Semantic Modeling 2

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Incremental Parsing in Bounded Memory William Schuler Department of Linguistics The Ohio State

A Survey On Automated Dynamic Malware Analysis Evasion and Counter-Evasion: PC, Mobile, and Web

Towards a syntactically motivated analysis of modifiers in German Ines Rehbein & Hagen

Good Morning and Welcome Back! We trust you have a productive and an enjoyable time here in

On the (Im)possibility of Privately Outsourcing Linear Programming 26.10.13 1 / 25 Linear

T H E L LV M PA S S M A N AG E R PA RT 2 F R O M T H E P R E V I O U S TA L K : A pass

1 Not only does a great mom (or grandma) MODEL, but she also... OBSERVE What do I mean by

MAM: where we are, where we go panel discussion MAM 9, June. 30, 2016, Budapest, Hungary panel

Learning Effective and Interpretable Semantic Models using - PowerPoint PPT Presentation

1 Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding (NNSE) Brian Murphy, Partha Talukdar, Tom Mitchell Machine Learning Department Carnegie Mellon University 2 Distributional Semantic Modeling 2

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

RDF, RDFS and OWL: Graph Data Models for the Semantic Web Semantic Web: The Idea Semantic

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Incremental Parsing in Bounded Memory William Schuler Department of Linguistics The Ohio State

A Survey On Automated Dynamic Malware Analysis Evasion and Counter-Evasion: PC, Mobile, and Web

Towards a syntactically motivated analysis of modifiers in German Ines Rehbein &amp; Hagen

Good Morning and Welcome Back! We trust you have a productive and an enjoyable time here in

On the (Im)possibility of Privately Outsourcing Linear Programming 26.10.13 1 / 25 Linear

T H E L LV M PA S S M A N AG E R PA RT 2 F R O M T H E P R E V I O U S TA L K : A pass

1 Not only does a great mom (or grandma) MODEL, but she also... OBSERVE What do I mean by

MAM: where we are, where we go panel discussion MAM 9, June. 30, 2016, Budapest, Hungary panel

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Towards a syntactically motivated analysis of modifiers in German Ines Rehbein & Hagen