In-Database Factorized Learning Dan Olteanu Joint work with M. Schleich, J. Zavodny & FDB Team M. Abo-Khamis, H. Ngo, X. Nguyen http://www.cs.ox.ac.uk/projects/FDB/ Recent Trends in Knowledge Compilation Dagstuhl, Sept 2017 1 / 32
We Work on In-Database Analytics In-database analytics = solve optimization problems inside the database engine. Why in-database analytics? 1. Bring analytics close to data ⇒ Save non-trivial export/import time 2. Large chunks of analytics code can be rewritten into database queries ⇒ Use scalable systems and low complexity for query processing 3. Used by LogicBlox retail-planning and forecasting applications Unified in-database analytics solution for a host of optimization problems. 2 / 32
Problem Formulation 3 / 32
Problem Formulation A typical machine learning task is to solve θ ∗ := arg min θ J ( θ ), where � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D θ = ( θ 1 , . . . , θ p ) ∈ R p are the parameters of the learned model 4 / 32
Problem Formulation A typical machine learning task is to solve θ ∗ := arg min θ J ( θ ), where � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D θ = ( θ 1 , . . . , θ p ) ∈ R p are the parameters of the learned model D is the training dataset with features x and response y ◮ Typically, D is the result of a feature extraction query over a database. 4 / 32
Problem Formulation A typical machine learning task is to solve θ ∗ := arg min θ J ( θ ), where � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D θ = ( θ 1 , . . . , θ p ) ∈ R p are the parameters of the learned model D is the training dataset with features x and response y ◮ Typically, D is the result of a feature extraction query over a database. L is a loss function, Ω is the regularizer 4 / 32
Problem Formulation A typical machine learning task is to solve θ ∗ := arg min θ J ( θ ), where � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D θ = ( θ 1 , . . . , θ p ) ∈ R p are the parameters of the learned model D is the training dataset with features x and response y ◮ Typically, D is the result of a feature extraction query over a database. L is a loss function, Ω is the regularizer functions g : R p → R m and h : R n → R m for n numeric features ( m > 0) ◮ g = ( g j ) j ∈ [ m ] is a vector of multivariate polynomials ◮ h = ( h j ) j ∈ [ m ] is a vector of multivariate monomials 4 / 32
Problem Formulation A typical machine learning task is to solve θ ∗ := arg min θ J ( θ ), where � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D θ = ( θ 1 , . . . , θ p ) ∈ R p are the parameters of the learned model D is the training dataset with features x and response y ◮ Typically, D is the result of a feature extraction query over a database. L is a loss function, Ω is the regularizer functions g : R p → R m and h : R n → R m for n numeric features ( m > 0) ◮ g = ( g j ) j ∈ [ m ] is a vector of multivariate polynomials ◮ h = ( h j ) j ∈ [ m ] is a vector of multivariate monomials Example problems: ridge linear regression , degree- d polynomial regression, degree- d factorization machines; logistic regression, SVM; PCA. 4 / 32
Ridge Linear Regression General problem formulation: � J ( θ ) := L ( � g ( θ ) , h ( x ) � , y ) + Ω( θ ) . ( x , y ) ∈ D Under square loss L , ℓ 2 -regularization, data points x = ( x 0 , x 1 , . . . , x n ), p = n + 1 parameters θ = ( θ 0 , . . . , θ n ), ◮ x 0 = 1 corresponds to the bias parameter θ 0 g and h identity functions g ( θ ) = θ and h ( x ) = x ◮ � g ( θ ) , h ( x ) � = � θ , x � = � n k =0 θ k x k we obtain the following formulation for ridge linear regression: � � 2 � � n 1 + λ 2 � θ � 2 J ( θ ) := θ k x k − y 2 . 2 | D | k =0 ( x , y ) ∈ D 5 / 32
Rewriting the Objective Function J We decouple the parameters θ from the data-dependent features x in J . We can rewrite the loss function � � 2 � � n 1 + λ 2 � θ � 2 J ( θ ) := θ k x k − y 2 . 2 | D | ( x , y ) ∈ D k =0 as follows: J ( θ ) = 1 2 θ ⊤ Σ θ − � θ , c � + s Y 2 + λ 2 � θ � 2 2 , where � 1 Σ = ( σ i , j ) i , j ∈ [ n ] , σ i , j = x i · x j | D | ( x , y ) ∈ D � 1 c = ( c i ) i ∈ [ n ] , c i = y · x i | D | ( x , y ) ∈ D � 1 y 2 . s Y = | D | ( x , y ) ∈ D 6 / 32
Batch Gradient Descent for Parameter Computation Repeatedly update θ in the direction of the gradient until convergence: θ := θ − α · ∇ J ( θ ) . Since J ( θ ) = 1 2 θ ⊤ Σ θ − � θ , c � + s Y 2 + λ 2 � θ � 2 2 , the gradient vector ∇ J ( θ ) becomes: ∇ J ( θ ) = Σ θ − c + λ θ . 7 / 32
Key Insights The computation of the training dataset entails a high degree of redundancy, which can be avoided by factorized joins . Compressed lossless representations of query result that are: d eterministic D ecomposable O rdered M ulti-Valued D iagrams Aggregates can be computed directly over factorized joins. 8 / 32
Factorization Example 9 / 32
Factorization Example Orders (O for short) Dish (D for short) Items (I for short) customer day dish dish item item price Elise Monday burger burger patty patty 6 Elise Friday burger burger onion onion 2 Steve Friday hotdog burger bun bun 2 Joe Friday hotdog hotdog bun sausage 4 hotdog onion hotdog sausage Consider the join of the above relations: O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger onion 2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger onion 2 Elise Friday burger bun 2 . . . . . . . . . . . . . . . 10 / 32
Factorization Example O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger onion 2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger onion 2 Elise Friday burger bun 2 . . . . . . . . . . . . . . . A relational algebra expression encoding the above query result is: � Elise � × � Monday � × � burger � × � patty � × � 6 � ∪ � Elise � × � Monday � × � burger � × � onion � × � 2 � ∪ � Elise � × � Monday � × � burger � × � bun � × � 2 � ∪ � Elise � × � Friday � × � burger � × � patty � × � 6 � ∪ � Elise � × � Friday � × � burger � × � onion � × � 2 � ∪ � Elise � × � Friday � × � burger � × � bun � × � 2 � ∪ . . . It uses relational product ( × ), union ( ∪ ), and data (singleton relations). The attribute names are not shown to avoid clutter. 11 / 32
This is How A Factorized Join Looks Like ∪ � burger � � hotdog � dish × × ∪ ∪ ∪ ∪ day � Monday � � Friday � � patty � � bun � � onion � � Friday � � bun � � onion � � sausage � item × × × × × × × × × ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ costumer price � Elise � � Elise � � 6 � � 2 � � 2 � � Joe � � Steve � � 2 � � 2 � � 4 � Variable order Grounding of the variable order over the input database There are several algebraically equivalent factorized joins defined: by distributivity of product over union and their commutativity; as groundings of join trees. 12 / 32
This is How A Factorized Join Looks Like ∪ � burger � � hotdog � dish × × ∪ ∪ ∪ ∪ day � Monday � � Friday � � patty � � bun � � onion � � Friday � � bun � � onion � � sausage � item × × × × × × × × × ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ costumer price � Elise � � Elise � � 6 � � 2 � � 2 � � Joe � � Steve � � 2 � � 2 � � 4 � Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each union has children representing distinct domain values of a variable 13 / 32
This is How A Factorized Join Looks Like ∪ � burger � � hotdog � dish × × ∪ ∪ ∪ ∪ day � Monday � � Friday � � patty � � bun � � onion � � Friday � � bun � � onion � � sausage � item × × × × × × × × × ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ ∪ costumer price � Elise � � Elise � � 6 � � 2 � � 2 � � Joe � � Steve � � 2 � � 2 � � 4 � Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each product has children over disjoint sets of variables 14 / 32
Recommend
More recommend