people people name name id id age age stephanie
play

people people name name id id age age - PowerPoint PPT Presentation

people people name name id id age age stephanie stephanie 1 1 19 19 Query 1 dylan dylan 2 2 26 26 people.filter{p => p.age 18} mary kate mary kate 3 3 17 17 pets


  1. ● ○ ○ ● ○ ○ ● ○

  2. ● ○ ●

  3. people people name name id id age age stephanie stephanie 1 1 19 19 Query 1 dylan dylan 2 2 26 26 people.filter{p => p.age 18} mary kate mary kate 3 3 17 17 pets Query 2 name owner people.join(pets, "id === owner") catsidy 2 .filter(people.age 18) gigi 3

  4. Cache filter { p => p.age > 18 } people.filter(age 18) table people Cache Physical Optimization Substitution Planning filter filter filter { p => p.age > 18 } { p => p.age > 18 } { p => p.age > 18 } table people table people FileScan people

  5. Cache filter { p => p . age > 18 } people.join(pets, "id === owner") .filter(people.age 18) table people Cache Physical Optimization Substitution Planning select * select * select * join (owner, id) hashjoin filter people.age > 18 (owner, id) table filter pets people.age > 18 filter filescan pets join (owner, id) people.age > 18 table table table filescan people people pets people

  6. Cache filter { p => p . age > 18 } table people Cache Physical Optimization Substitution Planning select * select * select * join (owner, id) hashjoin filter people.age > 18 (owner, id) table filter pets people.age > 18 filter filescan pets join (owner, id) people.age > 18 table table table filescan people people pets people

  7. ○ ○ ○ ○

  8. Current Pipeline Physical Cache Optimization Planning Physical Optimization Cache Planning Optimization-first pipeline

  9. ● ○ ○ ● ○ ○ ○

  10. Current Pipeline Physical Cache Optimization Planning Optimization-first pipeline (slow!) Physical Optimization Cache Planning Insight: not all optimizations help caching! Partial Physical Cache Optimization Optimization Planning

  11. Boolean Simplification Constant Propagation ID Reassignment Filter Pruning Object Elimination Custom Rules ...

  12. ● ● ● ●

  13. ○ ○ ○ ○

  14. UDFs are blackboxes that hide caching opportunities select * select * { p => where age > 18 p.age > 18 } table people table people

  15. Program User Froid Acorn Synthesis Annotation

  16. Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓

  17. Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓

  18. Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓ General X X ✓ ✓ (Java, Scala)

  19. Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓ General X X ✓ ✓ (Java, Scala) Fast X ✓ ✓ ✓

  20. Scala Native Spark ● ● ●

  21. person.filter(p => p.age > 18) 1 aload_1 1 Person r1 := @param0 2 invokeinterface 2 double $d0 = r1.age() 3 dload_1 3 int $d1 = 18 4 ldc2_w 4 if $d0 < $d1 5 dcmpg 5 goto 8 6 ifge 18 6 boolean $zo = 1 7 iconst_1 7 goto 9 8 goto 10 8 $zo = 0 9 iconst_0 9 return $zo 10 aload_0 11 aload_1

  22. 1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this

  23. 1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age")

  24. 1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

  25. 1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

  26. 1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast (0) as boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

  27. 1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast(1) as boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

  28. 1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast (0) as cast (1) as boolean boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

  29. IF select age GreaterThan(Attribute("age"), Literal(18) filterUDF{ p => p.age > 18 } cast (1) cast(0) as as boolean boolean table people person.filter(p => p.age > 18)

  30. IF select age GreaterThan(Attribute("age"), Literal(18)) filter(If(GreaterThan("age", 18), cast 0 as boolean, cast 1 as boolean)) cast (1) cast(0) as as boolean boolean table people person.filter(p => p.age > 18)

  31. select * select * filter (If(GreaterThan("age", 18), cast 0 Partial Optimizer filter "age" > 18 as boolean, cast 1 as boolean)) table people table people person.filter(age > 18) person.filter(p => p.age > 18)

  32. ฀฀

  33. ● ● ●

  34. ● ○ ○ ● ●

Recommend


More recommend