Some graph optimization problems in data mining P. Van Dooren, - PowerPoint PPT Presentation

Some graph optimization problems in data mining P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on University of Chicago, October 16, 2012

Leuven Lambiotte et al Phys Rev, 2008 Call density over 6 months

Brussels Lambiotte et al Phys Rev, 2008 Call density over 6 months

Ref: Melchior, Eng. Thesis, UCL

Outline of the talk • Reputation systems Application to MovieLens Database • Similarity matrix of two graphs Application to Synonym Extraction • Concluding remarks

What is a reputation system ? Movielens

Motivation Detecting dishonest participants in auction systems ( ) ------------------------------ Removing spammers in on-line review databases ( Movielens ) ------------------------------ Giving a grade (reputation) to web raters ( ) --------------- Evaluating the trust of nodes in Peer2Peer systems ( )

Reputation of raters and objects Given a bipartite graph with n raters and m objects and votes on the edges, what should be the reputation of these n+m items ? Example: graph matrix form o1 o2 1 r1   5 1 5 r1 o1 1   = X (votes) r2 1 . r2   2 o2     2 3 r3 r3 3 Characterize the reputation f of the raters and r of the objects

Reputation of raters and objects Belief divergence = Variance f ? f 1 = 4.6 4.2 3.4 4.5 f 2 = 4.2 3.3 f 3 = 3 2.8 4.9 r ?

Reputation of raters and objects Belief divergence = Variance f ? 4.2 3.4 f 1 = 5 4.5 f 2 = 4.8 3.3 after convergence 2.8 f 3 = 1.4 4.9 r ?

Our approach Assume that every rater evaluates all objects with a vote  [0,1] and that f >0 are the voting matrix and the raters’ reputation The object’s reputation vector r is the weighted sum of the votes The rater’s reputation f depends on the discrepancy with the other votes There is a unique pair of vectors r and f satisfying these formulas when d  Inf De Kerchove-VD,SIAM News 08

Nonlinear iteration These two formulas lead to define the following iteration: where the voting matrix could be dynamic and then changes at each iteration. If the matrix X is fixed, we can prove Theorem If d > m , the iteration converges towards the unique fixed point that gives the reputations r of the objects and f(r) of the raters.

Cost function If d > m , the fixed point of our iteration corresponds to the minimum of the following cost function defined on the unit hypercube [0,1] m : E.g. for m=2 , the energy function looks like (for d>2 and for d=1.5)

Convergence and one iteration step corresponds to the steepest descent (with a particular step size) and this converges monotonically to r* since we have ||r k+1 -r k || 2

Data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies . Each user has rated at least 20 movies. The data was collected through the MovieLens web site (movielens.umn.edu) during a seven-month period 237 spammers (scoring always 1 except for their unique best friend that receive the maximum: 5) are added (+25%): The mean ( Left ) is less robust than our iteration ( Middle ) that also gives good results for the raters’ reputations ( Right ). Convergence for spammers separation after step 1, 2 and Inf

Some remarks Strengths : • linear complexity (in the number of votes) • applicable to any graph and with any rating matrix • can be dynamic (varying matrix X k ) • reputations for the raters • robust against attackers and spammers Further study : • choice of the function • stability for the dynamic case • mixing raters and objects

Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B This matrix can be obtained via fixed point of power method (linear) Ref: Blondel et al, SIAM Rev., ‘04

Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Element S 54 says how similar node 5 of A is to node 4 of B

Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Element S 43 says how similar node 4 of A is to node 3 of B

Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Two nodes are similar if their parents and children are similar Such a recursive definition leads to an eigenvector equation

Algorithm ? The (normalized) sequence Z k+1 = ( A Z k B T +A T Z k B)/ ||A Z k B T +A T Z k B|| F has two fixed points Z even and Z odd for every Z 0 >0 Similarity matrix S = lim k →∞ Z 2k , Z 0 = 1 S i,j is the similarity score between V i (A) and V j (B) With z k = vec( Z k ) , this is equivalent to the power method z k+1 = ( B  A + B T  A T ) z k / || ( B  A + B T  A T ) z k || 2 which is the power method on M = B  A + B T  A T

Some properties Satisfies ρ S = ASB T +A T S B, ρ = ||ASB T +A T SB|| F It is the nonnegative fixed point S of largest 1-norm It solves the optimization problem max  ASB T +A T SB , S  subject to || S || F =1 Extension of Kleinberg’s Hits method Linear convergence (power method for sparse M )

The dictionary graph Nodes = words present in the dictionary : 112,169 nodes Edge (u,v) if v appears in the definition of u : 1,398,424 edges Average of 12 edges per node Ref: Blondel et al, SIAM Rev., ‘04

Neighborhood graph is the subset of vertices used for finding synonyms : it contains “all” parents and children of the node neighborhood graph of likely “Central” uses this sub -graph to rank automatically synonyms Rank each node in the graph with the similarity to node c in b c e Ref: Blondel et al, SIAM Rev., ‘04

Disappear Vectors Central ArcRanc Wordnet Microsoft 1 vanish vanish epidemic vanish vanish 2 wear pass disappearing go away cease to exist 3 die die port end fade away 4 sail wear dissipate finish die out 5 faint faint cease terminate go 6 light fade eat cease evaporate 7 port sail gradually wane 8 absorb light instrumental expire 9 appear dissipate darkness withdraw 10 cease cease efface pass away Mark 3.6 6.3 1.2 7.5 8.6 Std Dev 1.8 1.7 1.2 1.4 1.3 Vectors, Central and ArcRank are automatic, Wordnet, Microsoft Word are manual

Sugar Vectors Central ArcRanc Wordnet Microsoft 1 juice cane granulation sweetening darling 2 starch starch shrub sweetener baby 3 cane sucrose sucrose carbohydrate honey 4 milk milk preserve saccharide dear 5 molasses sweet honeyed organic compound love 6 sucrose dextrose property saccarify dearest 7 wax molasses sorghum sweeten beloved 8 root juice grocer dulcify precious 9 crystalline glucose acetate edulcorate pet 10 confection lactose saccharine dulcorate babe Mark 3.9 6.3 4.3 6.2 4.7 Std Dev 2.0 2.4 2.3 2.9 2.7

|| S || F =1 U T U=V T V=I k U T U=V T V=I k

Optimization problems The fixed point of ρ S = ASB T +A T S B, ρ = ||ASB T +A T SB|| F corresponds to max  ASB T +A T SB , S  subject to || S || F =1 The fixed point of U Σ V T = Π opt ( AUV T B T +A T UV T B ) , corresponds to max  AUV T B T +A T UV T B , UV T  subject to U T U=V T V=I k This is not an eigenvalue problem anymore but can be computed using iterative techniques with a linear complexity per step

Projected correlation max  AUV T B T +A T UV T B , UV T  subject to U T U=V T V=I k Is also equivalent to max  U T AU ,V T BV  subject to U T U=V T V=I k U T AU and V T BV can be viewed as kxk “Rayleigh quotients” Linearly converging iteration (truncated SVD) U k+1 Σ k+1 V T +U ┴ Σ ┴ V ┴ T = AU k V T B T + A T U k V T B + sU k V T k+1 k k k

Correlation of graphs Graphs with similar structure Correlation is nearly optimal Fraikin, Nesterov, VD, LAA 07

Some remarks Optimization is on large sparse graphs Complexity of one iteration step is linear in the number of nodes in both graphs We have methods with linear convergence (power-like method and gradient like method) We have Newton-like methods with manifold constraints ( U T U=V T V=I k ) Extensions to colored nodes and edges

Some graph optimization problems in data mining P. Van Dooren, - PowerPoint PPT Presentation

Some graph optimization problems in data mining P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on University of Chicago, October 16, 2012 Leuven Lambiotte et al Phys Rev, 2008 Call density over 6

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Essentials Graph Basics Social Media Mining Social Media Mining Measures and Metrics

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

Topic II: Graph Mining Discrete Topics in Data Mining Universitt des Saarlandes, Saarbrcken

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Computing with a Thesaurus Word Senses and Word Relations

Ankita Samaddar, Zahra RahimiNasab Reza, Arvind Easwaran, Ansuman Banerjee, Xue Bai Contents

Diabetes Case Presentations Irl B. Hirsch, MD-Disclosures Research: Sanofi Diabetes, Halozyme

On a new affine formulation of Hamiltonian classical field theories Juan Carlos Marrero

Corn Plastic to the Rescue Wal-Mart and others are going green w ith "biodegradable"

Offshoring Bias in Japans Manufacturing Sector Prepared for the Final WIOD Conference: Causes

An XML-Format for Conjectures in Geometry (Work-in-Progress) Pedro Quaresma CISUC, Mathematics

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning Zijie Yan,

Some graph optimization problems in data mining P. Van Dooren, - PowerPoint PPT Presentation

Some graph optimization problems in data mining P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on University of Chicago, October 16, 2012 Leuven Lambiotte et al Phys Rev, 2008 Call density over 6

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Chapter X: Graph Mining Information Retrieval &amp; Data Mining Universitt des Saarlandes,

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Graph Essentials Graph Basics Social Media Mining Social Media Mining Measures and Metrics

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

Topic II: Graph Mining Discrete Topics in Data Mining Universitt des Saarlandes, Saarbrcken

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Computing with a Thesaurus Word Senses and Word Relations

Ankita Samaddar, Zahra RahimiNasab Reza, Arvind Easwaran, Ansuman Banerjee, Xue Bai Contents

Diabetes Case Presentations Irl B. Hirsch, MD-Disclosures Research: Sanofi Diabetes, Halozyme

On a new affine formulation of Hamiltonian classical field theories Juan Carlos Marrero

Corn Plastic to the Rescue Wal-Mart and others are going green w ith &quot;biodegradable&quot;

Offshoring Bias in Japans Manufacturing Sector Prepared for the Final WIOD Conference: Causes

An XML-Format for Conjectures in Geometry (Work-in-Progress) Pedro Quaresma CISUC, Mathematics

Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning Zijie Yan,

Chapter X: Graph Mining Information Retrieval & Data Mining Universitt des Saarlandes,

Corn Plastic to the Rescue Wal-Mart and others are going green w ith "biodegradable"