in database machine learning using gradient descent and
play

In-Database Machine Learning: Using Gradient Descent and Tensor - PowerPoint PPT Presentation

Chair III: Database Systems, Professorship for Data Mining and Analytics Chair XXV: Data Science and Engineering Department of Informatics Technical University of Munich In-Database Machine Learning: Using Gradient Descent and Tensor Algebra


  1. Chair III: Database Systems, Professorship for Data Mining and Analytics Chair XXV: Data Science and Engineering Department of Informatics Technical University of Munich In-Database Machine Learning: Using Gradient Descent and Tensor Algebra Maximilian E. Schüle, Frédéric Simonis, Thomas Heyenbrock, Alfons Kemper, Stephan Günnemann, Thomas Neumann Rostock, 04. März 2019 Maximilian E. Schüle | In-Database Machine Learning 1

  2. What Need Database Systems for ML? Database Systems Machine Learning Why don‘t use HyPer? Maximilian E. Schüle | In-Database Machine Learning 2

  3. What Need Database Systems for ML? Machine Learning : data in tensors and a parametrised loss function HyPer Tensors Gradient Descent CC BY-SA 3.0, TimothyRias https://commons.wikimedia.org/w/index.php? curid=14729540 Advantages: Optimisation problems are solvable in the core of database servers Goal: Make database systems more attractive What it is: Architectural blueprint for the integration of optimisation models in DBMS What it is not: Study about the quality of different optimisation problems Maximilian E. Schüle | In-Database Machine Learning 3

  4. What is Gradient Descent? Initial weights Linear Gradient Regression Descent m a, b ( rm )= a ∗ rm + b ≈ medv Median Value Optimal weights? Gradient Descent! Optimal predict l rm,medv ( a,b )=( m a,b ( rm )− medv ) 2 weights #Rooms Training Data Test Data RM MEDV RM MEDV How to optimse weights? How to label data? Maximilian E. Schüle | In-Database Machine Learning 4

  5. Approach Integration as operators in relational algebra Representation of mathematical functions on relations HyPer Concept of pipelines Gradient Gradient needed Descent Automatic differentiation Representation of tensors Tensors Either: one relation represents one tensor Or: own tensor data type CC BY-SA 3.0, TimothyRias https://commons.wikimedia.org/w/index.php? curid=14729540 Maximilian E. Schüle | In-Database Machine Learning 5

  6. Integration in Relational Algebra Operator Tree Model / Loss Function Pipelining Operators for labelling and Representation of a loss- as Integration as gradient descent: well as of a model function a pipeline breaker Pipelines (Weights/Data) m o d e l f u n c t i o n m m w ( x )= ∑ x i ∗ w i ≈ y i ∈ m l o s s f u n c t i o n l l x , y ( w )=( m w ( x )− y ) 2 λ Maximilian E. Schüle | In-Database Machine Learning 6

  7. Integration in Relational Algebra: Operator Tree Two Operators needed Gradient descent to optimise weights of a parametrised loss function Labelling operator to label predicted values L a b e l l i n g C a l c u l a t e d We i g h t s λ G r a d i e n t D e s c e n t T e s t D a t a M o d e l F u n c t i o n λ T r a i n i n g D a t a I n i t i a l We i g h t s L o s s F u n c t i o n Gradient Descent Initial weights and training data as input and optimised weights as output Lambda expression as loss function to be optimised Labelling Input: test dataset and optimal weights Label: evaluated lambda expression for each tuple Maximilian E. Schüle | In-Database Machine Learning 7

  8. Integration in Rel. Algebra: Lambda Functions Lambda Expression To inject user-defined code Operator k-Means λ Right Pipeline Input Points λ Left Pipeline Injected Code λ ( a , b ) s q r t ( ( a . x - b . x ) ^ 2 + ( a . y - b . y ) ^ 2 ) Euclidean Distance * k m e a n s ( ( t a b l e p o i n t s ) , s q r t ( ( a . x - b . x ) ^ 2 + ( a . y - b . y ) ^ 2 ) , 2 ) s e l e c t f r o m λ ( a , b ) Maximilian E. Schüle | In-Database Machine Learning 8

  9. Integration in Rel. Algebra: Lambda Functions Notation Relations/Lambda Functions w = ( w 1, w 2,... , w m ) W { [ w _ 1 , w _ 2 , … , w _ m ] } We i g h t s x = ( x 1, x 2,... ,x m , y ) X { [ x _ 1 , x _ 2 , . . , x _ m , y ] } n t u p l e w i t h m a t t r i b u t e s m w ( x )= ∑ x i ∗ w i ≈ y λ ( W , X ) ( W . w _ 1 * X . x _ 1 + . . . + X . x _ m ) M o d e l f u n c t i o n i ∈ m l x , y ( w )=( m w ( x )− y ) 2 L o s s f u n c t i o n λ ( W , X ) ( W . w _ 1 * X . x _ 1 + . . . + X . x _ m - y ) ² L a mb d a F u n c t i o n s i n S Q L c r e a t e t a b l e t r a i n i n g d a t a ( x f l o a t , y f l o a t ) ; c r e a t e t a b l e t e s t d a t a ( x f l o a t ) ; c r e a t e t a b l e w e i g h t s ( a f l o a t , b f l o a t ) ; c r e a t e t a b l e w e i g h t s ( a f l o a t , b f l o a t ) ; i n s e r t i n t o t r a i n i n g d a t a … i n s e r t i n t o w e i g h t s … i n s e r t i n t o t r a i n i n g d a t a … i n s e r t i n t o w e i g h t s … s e l e c t * f r o m g r a d i e n t d e s c e n t ( – l o s s f u n c t i o n a s λ - e x p r e s s i o n s e l e c t * f r o m l a b e l i n g ( λ ( d a t a , w e i g h t s ) ( w e i g h t s . a * d . x + w e i g h t s . b - d . y ) ² , – - m o d e l f u n c t i o n a s λ - e x p r e s s i o n - - t r a i n i n g s e t a n d i n i t i a l w e i g h t s λ ( d a t a , w e i g h t s ) ( w e i g h t s . a * d . x + w e i g h t s . b ) , ( s e l e c t x , y f r o m t r a i n i n g d a t a d ) , - - t r a i n i n g s e t a n d i n i t i a l w e i g h t s ( s e l e c t a , b f r o m w e i g h t s ) , ( s e l e c t x , y f r o m t e s t d a t a d ) , - - l e a r n i n g r a t e a n d m a x . n u m b e r o f i t e r a t i o n ( s e l e c t a , b f r o m w e i g h t s ) 0 . 0 5 , 1 0 0 ) ; ) ; Maximilian E. Schüle | In-Database Machine Learning 9

  10. Integration in Relational Algebra: Pipelining L a b e l l i n g λ G r a d i e n t D e s c e n t T e s t D a t a λ T r a i n i n g D a t a I n i t i a l We i g h t s Materialising Pipelined Combined m a x . I t e r a t i o n s m a x . I t e r a t i o n s - 1 S t o c h a s t i c G r a d i e n t D e s c e n t S t o c h a s t i c G r a d i e n t D e s c e n t s u b s u b s u b m a i n B a t c h / S t o . G D s u b s u b s u b m a i n s u b s u b s u b m a i n B a t c h / S t o . G D λ λ λ . . . m T r a i n i n g D a t a I n i t i a l We i g h t s I n i t i a l We i g h t s T r a i n i n g D a t a I n i t i a l We i g h t s 1 a x I t e r Maximilian E. Schüle | In-Database Machine Learning 10

  11. Integration in Relational Algebra: Pipelining m a x . I t e r a t i o n s Materialisation of all tuples (parallel/serial) s u b s u b s u b m a i n B a t c h / S t o . G D Any optimisation method possible Materialising Parallelism: parallel_for λ T r a i n i n g D a t a I n i t i a l We i g h t s S t o c h a s t i c G r a d i e n t D e s c e n t No materialisation s u b s u b s u b m a i n Stochastic gradient descent only Pipelined Distribution to pipelines λ Downside: multiple copys of the operator tree . . . m I n i t i a l We i g h t s 1 a x I t e r m a x . I t e r a t i o n s - 1 S t o c h a s t i c G r a d i e n t D e s c e n t s u b s u b s u b m a i n B a t c h / S t o . G D First iteration in pipelines Combined Remaining ones in the main thread λ T r a i n i n g D a t a I n i t i a l We i g h t s Maximilian E. Schüle | In-Database Machine Learning 11

Recommend


More recommend