RaFM Rank-Aware Factorization Machines Yin Zheng On Behalf of Xiaoshuang Chen, Yin Zheng, Jiaxing Wang, Wenye Ma, Junzhou Huang
Motivation Factorization Machines Different features have different frequencies of occurences Factorized embeddings for each feature: v V i i Modeling pairwise interactions: y ˆ V V , x x w x b a i s i j i j i i i j , F , i j i F D V V , v v v v i j i j i f , j f , F M f 1 What is the best rank of the embeddings?
Motivation Performance of FMs with fixed ranks Overfitting Underfitting MovieLens Tag
Basic Model Rank-Aware Factorization Machines High-Rank FM Low-Rank FM Rank-Aware FM
Basic Model Rank-Aware Factorization Machines ˆ y V V , x x w x b a i s i j i j i i i j , F , i j i F Multiple embeddings with different ranks: k k k min k k , V V , v v v 1 2 k ij ij V , v , , v i ij i j i j i j i i i i RaFM The largest rank to Choose a proper rank for computation of pairwise interaction avoid overfitting (hyperparameters) What is the time and space complexity? • How to efficiently train RaFM? •
Space Complexity Active and Inactive Factors Described by Feature Set Inactive Factors F i F : k k k i p Inactive factors: v Need NOT be stored! F F p p v Active factors: F p m O D F Space Complexity: k k k 1 Active Factors
Time Complexity F i F : k k k i Auxiliary Variables k max l ,min k k , ij ij l k , 2 1 2 l l l l O D F A v v x x v x v x l k l k , i j i j i i i i 2 2 i j , F , i j i F i F k k 2 k v k | k | B ij v ij x x l k , l k , RaFM B l k , i j i j 1, m i j It is easy to prove that m B B A A O D F l k , 1 l k , k k , 1 k 1, k 1 k k k 1
Learning Algorithm F i F : k k k i Free and Dependent Factors Bi-Level Optimization 1 p N Inactive Factors v min L B , y F F 1, m p x 1 p v argmin L B , B , 1 p m 1, p 1, p 1 N F p 1 x Pushing dependent factors to approximate free factors Free Factors p v F F p p 1 p v Dependent Factors F p 1 Proved by Thm. 6
Experiment RaFM outperforms FM. • RaFM is also more computational • efficient than FM. Improvement: 0.5%~15% Model Size: 20%~66% Training Time: 24%~95%
Experiment Results on Tencent CTR Dataset RaFM vs. FM RaFM: 32 + 512 RaFM-low has similar performance as FM-32.
Pacific Ballroom Jun 13 th 6:30PM~9:00PM PosterID 220 Thanks! Code https://github.com/cxsmarkchan/RaFM Xiaoshuang Chen https://cxsmarkchan.github.io Yin Zheng https://sites.google.com/site/zhengyin1126
Recommend
More recommend