Cross-Domain Recommendation via Clustering on Multi-Layer Graphs Al Aleksandr Fa Farseev, Ivan Samborskii, Andrey Filchenkov, Tat-Seng Chua By AleksandrFarseev http://farseev .com Aug 8 th , 2017
Venue Category Recommendation Collaborative Venue Category Recommendation β recommendation of venue categories (i.e. restaurant, cinema) to user using information about his/her profile (i.e. past visits) and/or information about users from the same domain. Venue categories: Clothing Store Hotel Venue categories: Ice Cream Shop Total 764 different categories
Idea 1: Utilization of Individual And Group Knowledge for Better Recommendation
User Community-Based Collaborative Recommendation We perform venue category recommendation based on both individual and group knowledge => naturally models the impact of society on an individual's behavior during the selection of a new place to go: β π€ππ 0 0β2 3 π ππ π£ = π‘ππ π’ πΏ * π€ππ , + π π· , +
What do we need user communities for? + Users from the same community (extracted from multi-source data) may have similar location preferences + Search within user community significantly reduces search space during the recommendation process
Example of User Communities (1) Community 1: Gingers Community K: Darker Hair
User Relation and Community Representations One way to find user communities is to model users' relationships in the form of a graph so that dense subgraphs are considered to be user communities.
Community Detection based on a single data source One of the commonly formulations is MinCut problem. For a given number k of subsets, the MinCut involves choosing a partition π· ; ,β¦, π· > such that it minimizes the expression: > ππ£π’ π· ; ,β¦ ,π· > = ? π(π· B ,π·Μ B ) BE; *W is the sum of weights of edges attached to vertices in π· B
How to solve MinCut problem? Approximation of MinCut as standard tr st trace mi minimi mization problem: m: HβI JΓL tr π O ππ ,s.t. π O π = π½ min which can be solved by Sp Spectral Clu lusterin ing: Calculates Laplacian matrix π β π UΓU 1. 2. Builds matrix of the first π eigenvectors π β π UΓ> correspond to the smallest eigenvalues of π 3. Clusters data in a new space π using i.e. π -means algorithm
Idea 2: Utilization of Multi-Source Data
Most of user actively use β 3 social networks Accounts Ac ~6 registered social network ~6 accounts per person* 5 Ac Active Usage 4 6 People actively use ~3 ~3 social platforms simultaneously* 3 7 2 8 1 9 0 10 * GlobalWebIndex. 2016. GWI Social report. http://www.globalwebindex.net/blog/internet-users-have-average-of-5-social-media-accounts
Multi-source data describe user from multiple views
Cross-Domain Venue Category Recommendation Cr Cross Domain - Ve Venue ca category reco commendation β recommendation of venue categories (i.e. restaurant, cinema) using information about his/her profile (i.e. past visits) and/or information about users from other sources (i.e. images, texts, location types). Venue categories: Clothing Store Hotel Ice Cream Shop Multi-Source Data:
Community Detection must performed in a Cross-Source Manner⦠Problems: ⒠Data source integration ⒠Community detection
How to represent multi-source data? Mu Multi-la layer graph β graph π» , where π» = π» B , π» B = π,πΉ B
Extending definition of spectral clustering [ HβI JΓL ? tr π O π B π , s.t.π O π = π½ min BE; [ HβI JΓL tr π O π \,] π , where π \,] = ? π B min BE; Such approximation could suffer from poor poor ge gene neralization on abi bility.
Regularized Clustering on Multi-layer Graph -1 Use Gr Grassman Ma Manifolds to keep final latent representation βcloseβ to all layers of multi-layer graph*. Where projected distance between two spaces π ; and π b : b = 1 b ,where π΅ k is the Frobenius norm O β π b O π defg π ; ,π 2 π ; π b π ; b k [ = ππ β ?tr(ππ O β π B π B b [ O ) π defg π, π B BE; BE; * X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov. Clustering on multi-layer graphs via subspace analysis on grassmann manifolds. IEEE Transactions on Signal Processing, 2014.
Regularized Clustering on Multi-layer Graph -2 Extends the objective function to introduce the subspace analysis regularization [ [ O π O π B π + π½ ππ O π B π B ,s.t. π O π = π½ Hββ JΓL ? tr min ππ β ? tr BE; BE; Hββ JΓL tr π O π ]ft π min [ O ) π ]ft = ?(π B β π½π B π B BE;
Idea 4: Making use of Inter-Layer (Inter-Source) Relations
Incorporating inter-layer relationship (1) By using distance on Grassman Manifolds, we present the new objective function for the π th layer: [ v B O π B π v B + πΎ B v B π v B O π O v w ββ JΓL tr π min ππ β ? π₯ B,g tr π g π g H gE;,gzB O π v B { B π v B v w ββ JΓL tr π min H [ { B = π B β πΎ B O π ? π₯ B,g tr π g π g gE;,gzB
But how can we determine w |,} when computing i-th layer ? O π v B { B π v B v w ββ JΓL tr π min H [ { B = π B β πΎ B O π ? π₯ B,g tr π g π g gE;,gzB In Inter-la layer rela latio ionship ip graph πΊ(πΎ,π) β weighted graph which represents the similarity between layers. π B,> β π g,> β β 1 β >Eb π π β 1 β π,π β πΉ, π₯ B,g = πΏ β 1 where π B,> is clustering co-occurrence matrix of layer π , π β‘,Λ = 1, if users π and π assigned to the same cluster , and 0 otherwise.
Final objective function Letβs combine equations from previous slides to define the final objective function: [ [ { B π + π½ v B π v B O π O π ππ O π min ββ JΓL ?tr ππ β ? tr = H BE; BE; [ ββ JΓL tr π O ?(π { B β π½π v B π v B O ) = min π H BE;
Problems β’ Community detection β’ Data source integration
Recall: Community-Based Cross-Domain Recommendation We perform venue category recommendation based on both individual and group knowledge, where group knowledge is obtained from multiple sources: β π€ππ 0 0β2 3 π ππ π£ = π‘ππ π’ πΏ * π€ππ , + π π· , +
Foursquare Instagram NUS-MSS Dataset Dataset* is presented as a set of features, extracted from user-generated data in three social networks: - text based fromTwitter (LDA, LIWC, text features) - image based from Instagram (concepts) - location based from Foursquare (LDA, categories, Mobility Features) Foursquare categories is splited into two parts: 3 months data (train) and 2 months (test). Twitter * A. Farseev, N. Liqiang, M. Akbari, and T.-S. Chua. Ha Harvesting multiple so sources s for use ser profile learning: a Big data st study. ACM International Conference on Multimedia Retrieval (ICMR). China. June 23-26, 2015.
Data Sources Text Features: Linguistic features: LIWC; Latent Topics Heuristic features: Writing behavior LIWC LDA Location Features: Location Semantics: Venue Category Distribution Mobility Location Type Mobility Features: Areas of Interest (AOI) Preferences Image Features Image Google Net Concepts Image Concept Distribution (Image Net) Images
Evaluation Baselines Re Recommender Systems Co Community Detection Approaches β’ π£ β C β R recommendation without inter-layer π π π β π Po Popular (PO POP) P) βrecommendation based on userβs past regularization experience β’ π£ - π β’ ππ©π β C β R recommendation without inter-layer π π π β π Popular Al All (POP Al All) ) βrecommendation based on experience of regularization and sub-space regularization all users π π π β π«πππ β C β R recommendation without user Mu Multi-So Source Re-Ra Ranking (MSRR) RR) β linearly combines community extraction recommendation results from all data modalities π π π (DB ) β C β R recommendation, where user Nearest Ne Ne Neighbor Collaborative Filtering (CF) β DBScan) recommendation based on top k most similar Foursquare users communities are detected by Density-Based clustering (DBScan) Ea Early Fusion (EF EF) β fuses multi-source data into a single feature π π π (x means) β C β R recommendation, where user vector (x-me communities are detected by x-means clustering SV SVD++ β makes use of the βimplicit feedbackβ information π π π (H (Hierarchical) β C β R recommendation, where user FMβ brings together the advantages of different factorization- FM communities are detected by Hierarchical Clustering based models via regularization. π π π β Our Ap Approach
Evaluation against other recommender systems
Evaluation against other community detection approaches + Incorporation of group knowledge is is important + Multi-modal clustering performs better than single-source clustering + Incorporation of Inter-Source relationshipis crucial.
Evaluation against source combinations + In different geo regions, different data sources are of different importance + Location data is more powerful than other data modalities
Recommend
More recommend