Some graph optimization problems in data mining P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on University of Chicago, October 16, 2012
Leuven Lambiotte et al Phys Rev, 2008 Call density over 6 months
Brussels Lambiotte et al Phys Rev, 2008 Call density over 6 months
Ref: Melchior, Eng. Thesis, UCL
Outline of the talk • Reputation systems Application to MovieLens Database • Similarity matrix of two graphs Application to Synonym Extraction • Concluding remarks
What is a reputation system ? Movielens
Motivation Detecting dishonest participants in auction systems ( ) ------------------------------ Removing spammers in on-line review databases ( Movielens ) ------------------------------ Giving a grade (reputation) to web raters ( ) --------------- Evaluating the trust of nodes in Peer2Peer systems ( )
Reputation of raters and objects Given a bipartite graph with n raters and m objects and votes on the edges, what should be the reputation of these n+m items ? Example: graph matrix form o1 o2 1 r1 5 1 5 r1 o1 1 = X (votes) r2 1 . r2 2 o2 2 3 r3 r3 3 Characterize the reputation f of the raters and r of the objects
Reputation of raters and objects Belief divergence = Variance f ? f 1 = 4.6 4.2 3.4 4.5 f 2 = 4.2 3.3 f 3 = 3 2.8 4.9 r ?
Reputation of raters and objects Belief divergence = Variance f ? f 1 = 4.6 4.2 3.4 4.5 f 2 = 4.2 3.3 f 3 = 3 2.8 4.9 r ?
Reputation of raters and objects Belief divergence = Variance f ? 4.2 3.4 f 1 = 5 4.5 f 2 = 4.8 3.3 after convergence 2.8 f 3 = 1.4 4.9 r ?
Our approach Assume that every rater evaluates all objects with a vote [0,1] and that f >0 are the voting matrix and the raters’ reputation The object’s reputation vector r is the weighted sum of the votes The rater’s reputation f depends on the discrepancy with the other votes There is a unique pair of vectors r and f satisfying these formulas when d Inf De Kerchove-VD,SIAM News 08
Nonlinear iteration These two formulas lead to define the following iteration: where the voting matrix could be dynamic and then changes at each iteration. If the matrix X is fixed, we can prove Theorem If d > m , the iteration converges towards the unique fixed point that gives the reputations r of the objects and f(r) of the raters.
Cost function If d > m , the fixed point of our iteration corresponds to the minimum of the following cost function defined on the unit hypercube [0,1] m : E.g. for m=2 , the energy function looks like (for d>2 and for d=1.5)
Convergence and one iteration step corresponds to the steepest descent (with a particular step size) and this converges monotonically to r* since we have ||r k+1 -r k || 2
Data set consists of 100,000 ratings (1-5) from 943 users on 1682 movies . Each user has rated at least 20 movies. The data was collected through the MovieLens web site (movielens.umn.edu) during a seven-month period 237 spammers (scoring always 1 except for their unique best friend that receive the maximum: 5) are added (+25%): The mean ( Left ) is less robust than our iteration ( Middle ) that also gives good results for the raters’ reputations ( Right ). Convergence for spammers separation after step 1, 2 and Inf
Some remarks Strengths : • linear complexity (in the number of votes) • applicable to any graph and with any rating matrix • can be dynamic (varying matrix X k ) • reputations for the raters • robust against attackers and spammers Further study : • choice of the function • stability for the dynamic case • mixing raters and objects
Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B This matrix can be obtained via fixed point of power method (linear) Ref: Blondel et al, SIAM Rev., ‘04
Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Element S 54 says how similar node 5 of A is to node 4 of B
Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Element S 43 says how similar node 4 of A is to node 3 of B
Similarity matrix of two arbitrary graphs For A and B adjacency matrices of the two graphs S solves ρ S = A S B T + A T S B Two nodes are similar if their parents and children are similar Such a recursive definition leads to an eigenvector equation
Algorithm ? The (normalized) sequence Z k+1 = ( A Z k B T +A T Z k B)/ ||A Z k B T +A T Z k B|| F has two fixed points Z even and Z odd for every Z 0 >0 Similarity matrix S = lim k →∞ Z 2k , Z 0 = 1 S i,j is the similarity score between V i (A) and V j (B) With z k = vec( Z k ) , this is equivalent to the power method z k+1 = ( B A + B T A T ) z k / || ( B A + B T A T ) z k || 2 which is the power method on M = B A + B T A T
Some properties Satisfies ρ S = ASB T +A T S B, ρ = ||ASB T +A T SB|| F It is the nonnegative fixed point S of largest 1-norm It solves the optimization problem max ASB T +A T SB , S subject to || S || F =1 Extension of Kleinberg’s Hits method Linear convergence (power method for sparse M )
The dictionary graph Nodes = words present in the dictionary : 112,169 nodes Edge (u,v) if v appears in the definition of u : 1,398,424 edges Average of 12 edges per node Ref: Blondel et al, SIAM Rev., ‘04
Neighborhood graph is the subset of vertices used for finding synonyms : it contains “all” parents and children of the node neighborhood graph of likely “Central” uses this sub -graph to rank automatically synonyms Rank each node in the graph with the similarity to node c in b c e Ref: Blondel et al, SIAM Rev., ‘04
Disappear Vectors Central ArcRanc Wordnet Microsoft 1 vanish vanish epidemic vanish vanish 2 wear pass disappearing go away cease to exist 3 die die port end fade away 4 sail wear dissipate finish die out 5 faint faint cease terminate go 6 light fade eat cease evaporate 7 port sail gradually wane 8 absorb light instrumental expire 9 appear dissipate darkness withdraw 10 cease cease efface pass away Mark 3.6 6.3 1.2 7.5 8.6 Std Dev 1.8 1.7 1.2 1.4 1.3 Vectors, Central and ArcRank are automatic, Wordnet, Microsoft Word are manual
Sugar Vectors Central ArcRanc Wordnet Microsoft 1 juice cane granulation sweetening darling 2 starch starch shrub sweetener baby 3 cane sucrose sucrose carbohydrate honey 4 milk milk preserve saccharide dear 5 molasses sweet honeyed organic compound love 6 sucrose dextrose property saccarify dearest 7 wax molasses sorghum sweeten beloved 8 root juice grocer dulcify precious 9 crystalline glucose acetate edulcorate pet 10 confection lactose saccharine dulcorate babe Mark 3.9 6.3 4.3 6.2 4.7 Std Dev 2.0 2.4 2.3 2.9 2.7
|| S || F =1 U T U=V T V=I k U T U=V T V=I k
Optimization problems The fixed point of ρ S = ASB T +A T S B, ρ = ||ASB T +A T SB|| F corresponds to max ASB T +A T SB , S subject to || S || F =1 The fixed point of U Σ V T = Π opt ( AUV T B T +A T UV T B ) , corresponds to max AUV T B T +A T UV T B , UV T subject to U T U=V T V=I k This is not an eigenvalue problem anymore but can be computed using iterative techniques with a linear complexity per step
Projected correlation max AUV T B T +A T UV T B , UV T subject to U T U=V T V=I k Is also equivalent to max U T AU ,V T BV subject to U T U=V T V=I k U T AU and V T BV can be viewed as kxk “Rayleigh quotients” Linearly converging iteration (truncated SVD) U k+1 Σ k+1 V T +U ┴ Σ ┴ V ┴ T = AU k V T B T + A T U k V T B + sU k V T k+1 k k k
Correlation of graphs Graphs with similar structure Correlation is nearly optimal Fraikin, Nesterov, VD, LAA 07
Some remarks Optimization is on large sparse graphs Complexity of one iteration step is linear in the number of nodes in both graphs We have methods with linear convergence (power-like method and gradient like method) We have Newton-like methods with manifold constraints ( U T U=V T V=I k ) Extensions to colored nodes and edges
Recommend
More recommend