● ○ ○ ● ○ ○ ● ○
● ○ ●
people people name name id id age age stephanie stephanie 1 1 19 19 Query 1 dylan dylan 2 2 26 26 people.filter{p => p.age 18} mary kate mary kate 3 3 17 17 pets Query 2 name owner people.join(pets, "id === owner") catsidy 2 .filter(people.age 18) gigi 3
Cache filter { p => p.age > 18 } people.filter(age 18) table people Cache Physical Optimization Substitution Planning filter filter filter { p => p.age > 18 } { p => p.age > 18 } { p => p.age > 18 } table people table people FileScan people
Cache filter { p => p . age > 18 } people.join(pets, "id === owner") .filter(people.age 18) table people Cache Physical Optimization Substitution Planning select * select * select * join (owner, id) hashjoin filter people.age > 18 (owner, id) table filter pets people.age > 18 filter filescan pets join (owner, id) people.age > 18 table table table filescan people people pets people
Cache filter { p => p . age > 18 } table people Cache Physical Optimization Substitution Planning select * select * select * join (owner, id) hashjoin filter people.age > 18 (owner, id) table filter pets people.age > 18 filter filescan pets join (owner, id) people.age > 18 table table table filescan people people pets people
○ ○ ○ ○
Current Pipeline Physical Cache Optimization Planning Physical Optimization Cache Planning Optimization-first pipeline
● ○ ○ ● ○ ○ ○
Current Pipeline Physical Cache Optimization Planning Optimization-first pipeline (slow!) Physical Optimization Cache Planning Insight: not all optimizations help caching! Partial Physical Cache Optimization Optimization Planning
Boolean Simplification Constant Propagation ID Reassignment Filter Pruning Object Elimination Custom Rules ...
● ● ● ●
○ ○ ○ ○
UDFs are blackboxes that hide caching opportunities select * select * { p => where age > 18 p.age > 18 } table people table people
Program User Froid Acorn Synthesis Annotation
Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓
Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓
Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓ General X X ✓ ✓ (Java, Scala)
Program User Froid Acorn Synthesis Annotation Correct ✓ ✓ ✓ ✓ Transparent X ✓ ✓ ✓ General X X ✓ ✓ (Java, Scala) Fast X ✓ ✓ ✓
Scala Native Spark ● ● ●
person.filter(p => p.age > 18) 1 aload_1 1 Person r1 := @param0 2 invokeinterface 2 double $d0 = r1.age() 3 dload_1 3 int $d1 = 18 4 ldc2_w 4 if $d0 < $d1 5 dcmpg 5 goto 8 6 ifge 18 6 boolean $zo = 1 7 iconst_1 7 goto 9 8 goto 10 8 $zo = 0 9 iconst_0 9 return $zo 10 aload_0 11 aload_1
1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this
1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age")
1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)
1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)
1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast (0) as boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)
1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast(1) as boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)
1 Person r1 := @param0 2 double $d0 = r1.age() If 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 GreaterThan(Attribute("age"), Literal(18)) 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo Name Type Expression cast (0) as cast (1) as boolean boolean r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)
IF select age GreaterThan(Attribute("age"), Literal(18) filterUDF{ p => p.age > 18 } cast (1) cast(0) as as boolean boolean table people person.filter(p => p.age > 18)
IF select age GreaterThan(Attribute("age"), Literal(18)) filter(If(GreaterThan("age", 18), cast 0 as boolean, cast 1 as boolean)) cast (1) cast(0) as as boolean boolean table people person.filter(p => p.age > 18)
select * select * filter (If(GreaterThan("age", 18), cast 0 Partial Optimizer filter "age" > 18 as boolean, cast 1 as boolean)) table people table people person.filter(age > 18) person.filter(p => p.age > 18)
● ● ●
● ○ ○ ● ●
Recommend
More recommend