Foundations of Comparative Analytics for Uncertainty in Graphs Lise - PowerPoint PPT Presentation

Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University Students: Steve Bach, Matthias Broecheler, Hossam Sharara, Galileo Namata, Nathaniel Cesario, Awalin Sopan, Denis Dimitrov, Katarina Yang

Objectives § Develop mathematical models for capturing uncertainty in graphs: - node merging uncertainty (entity resolution) - edge existence uncertainty (link prediction) - node label uncertainty (collective classification) § Develop visual analytic tools for comparative analysis of uncertainty such models

Proposed Approaches § Uncertainty in Graphs: Foundations - Probabilistic Soft Logic (PSL) - http://psl.umiacs.umd.edu/ § Uncertainty in Graphs: Comparative Analytics - G-Pare (Graph Compare) - http://www.cs.umd.edu/projects/linqs/gpare

PSL Foundations • Declarative language based on logic to express collective probabilistic inference problems • Probabilistic Model § Undirected graphical model § Constrained Continuous Markov Random Field (CCMRF) • Key distinctions § Continuous-valued random variables § Efficiently compute similarity & propagate similarity § Ability to efficiently reason about sets and aggregates § Scalable inference using consensus optimization

What is PSL Good for? § Specifying probabilistic models for: - Information Alignment - Information Fusion - Information Diffusion § Each of these requires: - Entity resolution - Link prediction Recent applications: • Sentiment Analysis - Node Labeling • Models of Group Affiliation • Graph Summarization • Role Identification in Online Discussions

Entity Resolution § Entities - People References John Smith J. Smith name name § Attributes A B - Name friend friend § Relationships C D F G - Friendship § Goal: Identify E = H references that denote the same person =

Entity Resolution § References, names, friendships John Smith J. Smith name name § Use rules to express A B evidence friend friend - ‘’If two people have similar names, they are probably the same’’ C D F G - ‘’If two people have similar friends, they are probably the same’’ E - ‘’If A=B and B=C, then A and C must = H also denote the same person’’ =

Entity Resolution A.name ≈ {str_sim} B.name => A ≈ B : 0.8 § References, names, friendships John Smith J. Smith name name § Use rules to express A B evidence friend friend - ‘’If two people have similar names, they are probably the same’’ C D F G - ‘’If two people have similar friends, they are probably the same’’ E - ‘’If A=B and B=C, then A and C must = H also denote the same person’’ =

Entity Resolution § References, names, friendships John Smith J. Smith name name § Use rules to express A B evidence friend friend - ‘’If two people have similar names, they are probably the same’’ C D F G - ‘’If two people have similar friends, they are probably the same’’ E - ‘’If A=B and B=C, then A and C must = H also denote the same person’’ = {A.friends} ≈ {} {B.friends} => A ≈ B : 0.6

Entity Resolution § References, names, friendships John Smith J. Smith name name § Use rules to express A B evidence friend friend - ‘’If two people have similar names, they are probably the same’’ C D F G - ‘’If two people have similar friends, they are probably the same’’ E - ‘’If A=B and B=C, then A and C must = H also denote the same person’’ = A ≈ B ^ B ≈ C => A ≈ C : ∞

Link Prediction  �  § Entities - People, Emails  § Attributes  - Words in emails  § Relationships  - communication, work    relationship § Goal: Identify work relationships  - Supervisor, subordinate, colleague

Link Prediction  �  § People, emails, words, communication, relations  § Use rules to express  evidence  - “If email content suggests role X,  person is of type X” - “If A sends deadline emails to B,    then A is the supervisor of B” - “If A is the supervisor of B, and A is the supervisor of C, then B and C are colleagues” 

Link Prediction  �  § People, emails, words, communication, relations  complete by § Use rules to express  due evidence  - “If email content suggests type X, it  is of type X” - “If A sends deadline emails to B,    then A is the supervisor of B” - “If A is the supervisor of B, and A is the supervisor of C, then B and C are colleagues” 

Link Prediction  �  § People, emails, words, communication, relations  § Use rules to express  evidence  - “If email content suggests type X, it  is of type X” - “If A sends deadline emails to B,    then A is the supervisor of B” - “If A is the supervisor of B, and A is the supervisor of C, then B and C are colleagues” 

Node Labeling ? 

Voter Opinion Modeling ?  $ $ Status update Tweet

Voter Opinion Modeling    friend spouse colleague friend spouse friend    friend colleague spouse

Voter Opinion Modeling vote(A,P) ∧ friend(B,A) à vote(B,P) : 0.3    friend spouse colleague friend spouse friend    friend colleague spouse vote(A,P) ∧ spouse(B,A) à vote(B,P) : 0.8

Mathematical Foundation

Rules H 1 ∨ ... H m ← B 1 ∧ B 2 ∧ ! ... B n § Atoms are real valued, [0,1] § Combination functions, Lukasiewicz T-norm § a 1 ∨ a 2 = min(1, a 1 +a 2 ) § a 1 ∧ ! a 2 = max(0, a 1 + a 2 - 1) § Distance to Satisfaction § h 1 ← b 1 ∧ ! b 2 R ≈ T ← A ≈ B:0.7 ∧ D ≈ E:0.8

Rules H 1 ∨ ... H m ← B 1 ∧ B 2 ∧ ! ... B n § Atoms are real valued, [0,1] § Combination functions, Lukasiewicz T-norm § a 1 ∨ a 2 = min(1, a 1 +a 2 ) § a 1 ∧ ! a 2 = max(0, a 1 + a 2 - 1) § Distance to Satisfaction § h 1 ← b 1 ∧ ! b 2 R ≈ T: ≥ 0.5 ← A ≈ B:0.7 ∧ D ≈ E:0.8

Rules H 1 ∨ ... H m ← B 1 ∧ B 2 ∧ ! ... B n § Atoms are real valued, [0,1] § Combination functions, Lukasiewicz T-norm § a 1 ∨ a 2 = min(1, a 1 +a 2 ) § a 1 ∧ ! a 2 = max(0, a 1 + a 2 - 1) § Distance to Satisfaction § h 1 ← b 1 ∧ ! b 2 R ≈ T:0.7 ← A ≈ B:0.7 ∧ D ≈ E:0.8 0.0 R ≈ T:0.2 ← A ≈ B:0.7 ∧ D ≈ E:0.8 0.3

Probabilistic Model Rule’s distance to satisfaction Rule’s weight Probability Distance density over exponent interpretation I in {1, 2} Set of ground Normalization rules constant Constrained Continuous Markov Random Field (CCMRF)

PSL Inference § CCMRF translates to a conic program in which: § MAP inference is tractable (O(n 3.5 )) using off-the-shelf interior point methods (IPM) optimization packages [Broecheler et al. UAI 2010] § Margin inference is based on sampling algorithms adapted from computational geometry methods for volume computation in high dimensional polytopes [Broecheler & Getoor, NIPS 2010] § While a naïve approach is tractable, it still suffers from problems of scalability § IPMs operate on matrices. These matrices become large and dense when many variables are all interdependent, such as is common in alignment problems. § Scaling to large data requires an alternative to forming and operating on such matrices

Consensus Optimization [Bach et al, NIPS 12] rules with local copies of original random variables random variables optimize truth update values & agreement variables to with original average of variables per rule copies key: fast solutions

Linear Constraints Time ¡in ¡seconds ¡ 600 ¡ CO-‑Linear ¡ 500 ¡ Interior-‑point ¡method ¡ 400 ¡ 300 ¡ 200 ¡ 100 ¡ 0 ¡ 125K ¡ 175K ¡ 225K ¡ 275K ¡ 325K ¡ 375K ¡ Number of potential functions and constraints

Quadratic Constraints 60K ¡ CO-‑Quad ¡ 50K ¡ Naive ¡CO-‑Quad ¡ Interior-‑point ¡method ¡ Time ¡in ¡seconds ¡ 40K ¡ 30K ¡ 20K ¡ 10K ¡ 0K ¡ 125K ¡ 175K ¡ 225K ¡ 275K ¡ 325K ¡ 375K ¡ Number of potential functions and constraints

Comparative Visual Analytics

G-Pare § A visual analytic tool that: - Supports the comparison of uncertain graphs - Integrates three coordinated views that enable users to visualize the output at different abstraction levels - Incorporates an adaptive exploration framework for identifying the models’ commonalities and differences

G-Pare Tabular View Network View Matrix View

Foundations of Comparative Analytics for Uncertainty in Graphs Lise - PowerPoint PPT Presentation

Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University Students: Steve Bach, Matthias Broecheler, Hossam Sharara, Galileo Namata,

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Google Analytics Overview Whats Google Analytics? The Google Analytics

Google Analytics A beginners guide What is Google Analytics? Google Analytics is not magic.

Introduction to Talent Analytics and Interim View 01 Overview Erich OSaben Talent Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative statics Comparative statics is the study of how endogenous variables respond to

Deirdre Boelke/Dr. Rachel Feeney Council Staff Herring AP/Cte. Mtg. Sept. 10, 2019 1 Outline

Presentation for two day Workshop on IWMP Planning for 2010-11 on 20 th & 21 st May 2010

A writer for all reasons Transitions in and out of school Anthony Par University of Limerick,

AT&T Modifications to Existing Wireless Facility Boston College Residence Hall 2000

NGATI KAUWHATA SUBMISSIONS Presentation to the AFFCO MANAWATU HEARINGS Manfield Stadium,

Southborough Recreation Commission Facilities Study and 2-5 Year Plan Master Plan Study

Global Sustainable Urban Development Indicators (GDI): Office for International and

Innovative Ways of Communicating Student Learning Ron Coleborn Math & Science Consultant

Foundations of Comparative Analytics for Uncertainty in Graphs Lise - PowerPoint PPT Presentation

Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University Students: Steve Bach, Matthias Broecheler, Hossam Sharara, Galileo Namata,

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Google Analytics Overview Whats Google Analytics? The Google Analytics

Google Analytics A beginners guide What is Google Analytics? Google Analytics is not magic.

Introduction to Talent Analytics and Interim View 01 Overview Erich OSaben Talent Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative statics Comparative statics is the study of how endogenous variables respond to

Deirdre Boelke/Dr. Rachel Feeney Council Staff Herring AP/Cte. Mtg. Sept. 10, 2019 1 Outline

Presentation for two day Workshop on IWMP Planning for 2010-11 on 20 th &amp; 21 st May 2010

A writer for all reasons Transitions in and out of school Anthony Par University of Limerick,

AT&amp;T Modifications to Existing Wireless Facility Boston College Residence Hall 2000

NGATI KAUWHATA SUBMISSIONS Presentation to the AFFCO MANAWATU HEARINGS Manfield Stadium,

Southborough Recreation Commission Facilities Study and 2-5 Year Plan Master Plan Study

Global Sustainable Urban Development Indicators (GDI): Office for International and

Innovative Ways of Communicating Student Learning Ron Coleborn Math &amp; Science Consultant

Presentation for two day Workshop on IWMP Planning for 2010-11 on 20 th & 21 st May 2010

AT&T Modifications to Existing Wireless Facility Boston College Residence Hall 2000

Innovative Ways of Communicating Student Learning Ron Coleborn Math & Science Consultant