L ARA : A Language of Linear and Relational Algebra for Polystores Dylan Hutchison advised by Bill Howe, Dan Suciu - Work in Progress -
Polystores SQL Matlab Spark Streaming DataFrames Polystores connect backend systems with frontend languages through a unifying "narrow API," using each system where it performs best. Array Table Graph Key-Value Store Store Engine Store
How to choose an algebra? Goal: Algorithms Implement algorithms! Matrix Data Cube Max Flow PageRank Inverse
How to choose an algebra? Goal: Algorithms Implement algorithms! Matrix Data Cube Max Flow PageRank Inverse Many candidate algebras… Algebras Objects : Relations Matrices Graphs Files Algebra := Objects + Relational BLAS/Linear Node/Edge (closed) Operations on Objects Ops : File Access Algebra Algebra Updates
How to choose an algebra? Goal: Algorithms Implement algorithms! Matrix Data Cube Max Flow PageRank Inverse Many candidate algebras… Algebras Objects : Relations Matrices Graphs Files Algebra := Objects + Relational BLAS/Linear Node/Edge (closed) Operations on Objects Ops : File Access Algebra Algebra Updates Neo4J, PostgreSQL ScaLAPACK CSV, HDF5 Allegro Many algebras have optimized Execution Engines execution engines
How to choose an algebra? Algorithms Matrix Goal: Data Cube Max Flow PageRank Inverse Implement algorithms! Algebras Objects : Relations Matrices Graphs Files Relational BLAS/Linear Node/Edge Ops : File Access Algebra Algebra Updates Associative Tables Answer: No choice necessary. Use Lara! L ARA 1. Write algorithm in any/all algebras ⋈ ⊗ ⋈ map f promote V ⊕ 2. Translate to/from Lara common algebra Neo4J, PostgreSQL ScaLAPACK CSV, HDF5 3. Use any/all execution engines Allegro Execution Engines
Operations of Lara • ⋈ ⊗ – Join: horizontally merge columns, select equal colliding keys, multiply colliding values • ⊕ – Union: vertically merge columns, ⋈ group by colliding keys, sum colliding values • map f – Map keys and old values to new values • promote V – Promote values to keys
Example: Ranking a Search Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W D site word score pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
Example: Ranking a Search Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W γ site, +(score) ( π site, word, (score*score') as score ( π word (Q) ⋈ D ⋈ ρ score score' (W))) RA: LA: diag(Q) +.* D +.* W D site word score pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
Example: Ranking a Search Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W γ site, +(score) ( π site, word, (score*score') as score ( π word (Q) ⋈ D ⋈ ρ score score' (W))) RA: (Matlab) LA: diag(Q) +.* D +.* W D site word score pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
Example: Ranking a Search Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W γ site, +(score) ( π site, word, (score*score') as score ( π word (Q) ⋈ D ⋈ ρ score score' (W))) RA: (Matlab) LA: diag(Q) +.* D +.* W π word (Q) ⋈ D +.* W Hybrid: D site word score pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
Example: Ranking a Search Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W γ site, +(score) ( π site, word, (score*score') as score ( π word (Q) ⋈ D ⋈ ρ score score' (W))) RA: (Matlab) LA: diag(Q) +.* D +.* W π word (Q) ⋈ D +.* W Hybrid: D (Q ⋈ * D ⋈ * W) E site L ARA : ⋈ site word score + pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
Example: Ranking a Search Executes on both RDBMS and BLAS, depending on cost model Suppose a user enters the search term "green delicious", as in input Q. Database D scoring sites with search term relevance. Table W weighs words by importance. Goal: Compute ranks of sites in D for search query Q, weighing by W γ site, +(score) ( π site, word, (score*score') as score ( π word (Q) ⋈ D ⋈ ρ score score' (W))) RA: (Matlab) LA: diag(Q) +.* D +.* W Many ways to express algorithms. Lara π word (Q) ⋈ D +.* W Hybrid: presents an economical algebra preserving D (Q ⋈ * D ⋈ * W) E site • L ARA : ⋈ LA's familiar math, numerical prowess site word score + • RA's flexibility, scale-out optimization pizzanow.com pizza 6 Q W Desired Output pizzanow.com delicious 5 word score word score site score allrecipes.com delicious 2 delicious 1 delicious 1 pizzanow.com 1*5*1 = 5 allrecipes.com green 2 green 1 pizza 1 allrecipes.com 1*2*1+1*2*2 = 6 allrecipes.com potatoes 5 (others) 0 potatoes 3 recycle.org 1*2*2 = 4 recycle.org green 2 green 2 (others) 0 (others) 0 (others) 0
L ARA : A Unifying Algebra Do you have an application more easily expressed in several algebras? Do you seek multi-system optimizations? Let's discuss! ☺
Vision for Polystore Systems Script SQL ∪ × π C LARA RA RDBMS SQL σ f ρ ⋈ ⊗ SQL Optimize ⋈ Matlab ⊕ & ⊕ ⊗ f Matlab LA Schedule map f ⊕ . ⊗ T Matlab promote V BLAS SQL ∖ γ RA SQL …
APIs of RA and LA Relational Algebra Linear Algebra Object: Relation Object: N-D Matrix • ∪ – Union • ⊕ – Element-wise add • ⊗ – Element-wise multiply • × – Cartesian Product • ⊕ . ⊗ – Matrix multiply • π C – (Extended) Projection • σ f – Select • Reduce – Sum along a dimension • ρ – Rename • Apply function to each element • T – Transpose • ∖ – Difference • γ – Aggregate • (Construction & De-construction)
Objects of Lara Associative Tables . Several interpretations: • Relational table with key columns & value columns with default values • Total function from key-space to value-space • Sparse tensor
Lara -> RA & LA Lara RA LA ⋈ ⊗ ⋈ , π ⊗ , ρ Tensor product γ ⊕ , ∪ ⋈ Reduce, e-wise sum ⊕ map f π f Apply promote V Re-index Re-key
Example derived operation: Outer Join Inner Join P ⋈ S P ⟗ S (formulas out of date)
Recommend
More recommend