As Strong as the Weakest Link: Mining Diverse Cliques in Weighted - PowerPoint PPT Presentation

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs Petko Bogdanov (UC Santa Barbara), with Ben Baumer (Smith College), Prithwish Basu (Raytheon BBN) , Amotz Bar-Noy (CUNY) and Ambuj K. Singh (UC Santa Barbara) ECML/PKDD, Prague, 2013

Example: Collaboration in sports Significance of a pair’s success when on a team 2

Influential groups 3

Multiple teams 4

Cliques in gene networks Complexes - interacting Gene Interaction Networks functional units* 5 * Leemor Joshua-Tor, Structure and Function of Nucleic Acid Regulatory Complexeshttp://www.hhmi.org/research/structure-and- function-nucleic-acid-regulatory-complexes

Cliques in other domains ● Sets of duplicates and near-duplicates in similarity networks ○ images ○ video ○ other complex objects with similarity function ● Co-evolving time series ○ stocks of companies related in a supply chain ○ brain regions co-associated 6 in performing a task* * Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ, Wedeen VJ, Sporns O (2008) Mapping the structural core of human cerebral cortex. PLoS Biology Vol. 6, No. 7, e159

Challenges ● Enumeration of cliques ○ MAX CLIQUE is NP-hard ● Ensuring diversity in the result set ○ Managing overlap “adds” complexity ● Size and density of real-world networks ● How to find the best diverse cliques efficiently while maintaining good quality of the solution 7

Outline ● Motivation and examples ● Problem statement and properties ● Proposed solutions ● Experiments ● Conclusion 8

Basic notions ● A graph G(V,E,w) represents a network of entities V with edges E among them ● w defines weights on edges ○ higher weight means stronger association ● A clique is a complete subgraph, i.e. all edges among the selected entities exist 9

Clique strength ● Strong ties of all pairwise edges ● A clique is as strong as its weakest link ● “Flat” teams in which all connections are important ● Bigger cliques featuring all strong edges are better 10

Diversity Score Diversity ● Linear combination of score and diversity via α ● Higher number of distinct nodes in solution means higher diversity 11

Example: Top-2 cliques Too much overlap 12

Example: Top-2 cliques Slightly lower score but less overlap 13

Complexity ● m-Diverse k- Structures (mDk S ) is NP-hard ○ reduction from SET COVER ● Even if we are interested in sets of arbitrary structure, maximizing diversity is NP-hard Included in solution 14

Approximation Diminishing ● Good news return ○ monotonic Candidate to ○ submodular add to solution ○ Allows a (1-1/e)-APX ● Challenges ○ Requires greedily finding the next best clique ○ MAX CLIQUE NP-hard to approximate to a constant ● Questions ○ Can we develop a solution with APX guarantees that is fast? Limitations? ○ Can we develop a very fast solution of good quality? 15

Intuition ● How to obtain good cliques while reducing the cost of enumeration? ● Exploit the distribution of edge weights in a real network. ● Consider good edges first. ● Include good cliques in solution before considering all edges based on bounding the contribution of partial cliques 17

Upper bound for an incomplete clique contribution Current lowest C The rest of weight will be the the nodes will lowest in the not overlap whole clique Optimistic completion 18

DiCliQ - threshold and prune 19

DiCliQ - threshold and prune 1. Enumerate cliques in a thresholded graph 2. Upper bound 3. If there is a candidate with a better score contribution than the best UB, add it to the solution 20

DiCliQ - threshold and prune 1. Enumerate cliques in a thresholded graph 2. Upper bound 3. If there is a candidate with a better score contribution than the best UB, add it to the solution 4. Lower threshold and repeat 21

DiCliQ - threshold and prune ● Implements a GREEDY and hence has a (1- 1/e)-approximation factor ● Exhaustive enumeration of all cliques might incur high cost in very large/dense instances ● How to scale up the discovery of diverse cliques without compromising the quality much? 22

BUDiC - Bottom-up greedy heuristic ● Greedy expansion UB? Already in around a node based the solution on the UB contribution A ● Incorporates diversity C UB? 23

BUDiC - Bottom-up greedy heuristic ● Greedy expansion Already in around a node based the solution on the UB contribution A ● Incorporates diversity C Grow away from included nodes 24 based on UB

BUDiC - Bottom-up greedy heuristic ● Greedy expansion Already in around a node based the solution on the UB contribution A ● Incorporates diversity ● Repeat for all nodes ● Scales much better: O C (m*k*|E|) ● No APX guarantee ● Good quality on real Grow away from datasets included nodes 25 based on UB

Data 27

Scalability Apx. guarantee Scalable, High Quality ● Compare running time to a Baseline (No thresholding) and relative quality to iMDV* ● α = 0.5, m = 10, k = 5 * S. Bandyopadhyay and M. Bhattacharyya. Mining the largest dense vertexlet in 28 a weighted scale-free graph. Fundam. Inform., 96(1-2):1–25, 2009

Scalability on YeastNet α=0.5, k=5 α=0.5, m=5 27

Quality 28

Discovering gene complexes 29

Conclusion ● General results for diverse clique mining ○ application to discovery of effective groups in collaboration ○ complexes in gene networks ○ similarity/correlation graphs ● Two scalable algorithms, one with constant factor approximation ● More than 3 orders of magnitude running time improvement while preserving good quality 30

Thank You Q&A The research was supported by the Army Research Laboratory under cooperative agreement W911NF-09-2-0053 (NS-CTA).

Effect of diversity parameter α 32

Groups in the other datasets ● The Harry Potter cast in the movies data set ● NBA: Nowitzki-Chandler-Stevenson of the defending champion Dallas Mavericks (addition of Chandler positive) ● MLB: Ramirez-Blake-Kuo of the LA Dodgers (13/14 with an otherwise unremarkable lineup reached the playoffs in 2008) 33

Related work ● Quasi-cliques ○ frequency of clique occurrence (not score) ○ non-unique labels ● Weighted cliques ○ Bandyopadhyay et al. 2009: no APX guarantees, single clique, extended version does not have as good quality ● Other subgraph types ○ Steiner trees ○ Clique percolation (CFinder) ○ Edge weights are constraints and not part of score ● Diversity of nodes labels within a clique 34

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted - PowerPoint PPT Presentation

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs Petko Bogdanov (UC Santa Barbara), with Ben Baumer (Smith College), Prithwish Basu (Raytheon BBN) , Amotz Bar-Noy (CUNY) and Ambuj K. Singh (UC Santa Barbara) ECML/PKDD,

Cliques & communities Network Analysis in Python I Cliques Social cliques:

CLIQUES: Security for Dynamic Peer Groups CLIQUES: Gene Tsudik Yongdae Kim Formation Member

From Cliques to Equilibria: From Cliques to Equilibria: The Dominant- -Set Framework for

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Human Error - The Weakest link in CyberSecurity Exceptional IT. Real People. Bigger Purpose.

Importance of sterilization packaging every chain is only as strong as its weakest link

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

CLIQUES : Security for Dynamic Peer Groups Formation Member add Member leave Group fusion

Edge-regular graphs and regular cliques Gary Greaves Nanyang Technological University, Singapore

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

On cliques in edge-regular graphs Leonard Soicher Queen Mary University of London Modern Trends

The Average-Case Complexity of Counting Cliques in Erd os-R enyi Hypergraphs Enric

The Weakest Failure Detectors to Boost Obstruction-Freedom Rachid Guerraoui 1 Micha Kapaka 1

A weakest precondition approach to active analysis attacks analysis Musard Balliu, Isabella

The weakest failure detectors to solve certain fundamental problems in distributed computing

COMP2111 Week 9 Term 1, 2020 Hoare Logic 1 Summary Weakest precondition reasoning Handling

C3GI 2017 Structural and Functional Neural Correlates of Emotional Responses to Music Gianluca

Deep Reinforcement Learning John Schulman 1 MLSS, May 2016, Cadiz 1 Berkeley Artificial

N E U R O N A L R H Y T H M S A F R A M E W O R K F O R U N D E R S T A N D I N G I N T E R A

------------------------ Cognitive benefits of learning to play chess and other strategy games

Neural Synchronization and Consciousness Lawrence M. Ward Department of Psychology, The Brain

INTRACEREBRAL HEMORRHAGE: STROKE RECOVERY TRAJECTORY AND OUTCOMES 1 Racing Against the Clock:

Intracerebral Hemorrhage: Intracerebral Hemorrhage (ICH) - Case Acute and Long-Term Blood

lowering therapy using statin in patients of diabetic retinopathy reduce cardiovascular events?

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted - PowerPoint PPT Presentation

As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs Petko Bogdanov (UC Santa Barbara), with Ben Baumer (Smith College), Prithwish Basu (Raytheon BBN) , Amotz Bar-Noy (CUNY) and Ambuj K. Singh (UC Santa Barbara) ECML/PKDD,

Cliques &amp; communities Network Analysis in Python I Cliques Social cliques:

CLIQUES: Security for Dynamic Peer Groups CLIQUES: Gene Tsudik Yongdae Kim Formation Member

From Cliques to Equilibria: From Cliques to Equilibria: The Dominant- -Set Framework for

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Human Error - The Weakest link in CyberSecurity Exceptional IT. Real People. Bigger Purpose.

Importance of sterilization packaging every chain is only as strong as its weakest link

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

CLIQUES : Security for Dynamic Peer Groups Formation Member add Member leave Group fusion

Edge-regular graphs and regular cliques Gary Greaves Nanyang Technological University, Singapore

Planted Cliques, Iterative Thresholding and Message Passing Algorithms Yash Deshpande and Andrea

On cliques in edge-regular graphs Leonard Soicher Queen Mary University of London Modern Trends

The Average-Case Complexity of Counting Cliques in Erd os-R enyi Hypergraphs Enric

The Weakest Failure Detectors to Boost Obstruction-Freedom Rachid Guerraoui 1 Micha Kapaka 1

A weakest precondition approach to active analysis attacks analysis Musard Balliu, Isabella

The weakest failure detectors to solve certain fundamental problems in distributed computing

COMP2111 Week 9 Term 1, 2020 Hoare Logic 1 Summary Weakest precondition reasoning Handling

C3GI 2017 Structural and Functional Neural Correlates of Emotional Responses to Music Gianluca

Deep Reinforcement Learning John Schulman 1 MLSS, May 2016, Cadiz 1 Berkeley Artificial

N E U R O N A L R H Y T H M S A F R A M E W O R K F O R U N D E R S T A N D I N G I N T E R A

------------------------ Cognitive benefits of learning to play chess and other strategy games

Neural Synchronization and Consciousness Lawrence M. Ward Department of Psychology, The Brain

INTRACEREBRAL HEMORRHAGE: STROKE RECOVERY TRAJECTORY AND OUTCOMES 1 Racing Against the Clock:

Intracerebral Hemorrhage: Intracerebral Hemorrhage (ICH) - Case Acute and Long-Term Blood

lowering therapy using statin in patients of diabetic retinopathy reduce cardiovascular events?

Cliques & communities Network Analysis in Python I Cliques Social cliques: