The Big λ Project Aws Albarghouthi Calvin Smith University of Wisconsin-Madison
input data map shu ffl e reduce output data m(i 1 ) i 1 reduce i 2 m(i 2 ) reduce output i 3 reduce m(i 3 ) …
Big λ : Analyses from Examples [PLDI16] Example: Output: { { i 1 o i 2 , Synthesize data-parallel programs from input/output examples
Challenges generate proven-deterministic Non-determinism solutions Variety of domains parameterize by extensible APIs syntactically restrict to data- Sparse search space parallel programs
Higher-order sketches Bias search heavily towards data-parallel programs Big λ uses 8 templates, gathered from reference implementations
Higher-order sketches Bias search heavily towards data-parallel programs map x . reduce x { e.g. flatmap x . reduce x . apply x map x . reduceByKey x . filter x Big λ uses 8 templates, gathered from reference implementations
Who uses the most #hashtags ? @Alice : “Hello AAIP #aaip #germany ” @Bob : “Co ff ee machine refilled yet? #caffeine #java #4thcup #zzz ” @Claire : “Torn between wine cellar and seminar #wine #seminar #zzz ”
Who uses the most #hashtags? { { @Alice : “Hello AAIP #aaip #germany ” @Bob : “Co ff ee machine refilled yet? @Bob #caffeine #java #4thcup #zzz ” @Claire : “Torn between wine cellar , and seminar #wine #seminar #zzz ” 2, 4, 3…must be @Bob !
@Alice : “Hello AAIP #aaip #germany ” @Bob : “Co ff ee machine refilled yet? #caffeine #java #4thcup #zzz ” @Claire : “Torn between wine cellar and seminar #wine #seminar #zzz ” let p = map m . reduce r . apply f
{2, @Alice } @Alice : “Hello AAIP #aaip #germany ” @Bob : “Co ff ee machine refilled yet? {4, @Bob } #caffeine #java #4thcup #zzz ” @Claire : “Torn between wine cellar {3, @Claire } and seminar #wine #seminar #zzz ” let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t))
{2, @Alice } @Alice : “Hello AAIP #aaip #germany ” @Bob : “Co ff ee machine refilled yet? {4, @Bob } #caffeine #java #4thcup #zzz ” @Claire : “Torn between wine cellar {3, @Claire } and seminar #wine #seminar #zzz ” let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t))
{2, @Alice } { {4, @Bob } { {4, @Bob } {4, @Bob } {3, @Claire } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y)
{2, @Alice } { {4, @Bob } { {4, @Bob } {4, @Bob } {3, @Claire } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y)
{4, @Bob } { {4, @Bob } { {2, @Alice } {4, @Bob } {3, @Claire } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y)
{3, @Claire } { {3, @Claire } { {2, @Alice } {4, @Bob } {4, @Bob } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y)
{3, @Claire } { {3, @Claire } { {2, @Alice } {4, @Bob } {4, @Bob } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y)
@Bob {4, @Bob } let p = map m . reduce r . apply f where m = λ t. (len(filter(is_hashtag, t)), author(t)) r = λ x,y. max(x, y) f = λ p. snd(p)
Synthesis modulo differential privacy? [ in progress ]
map m . reduce r
map m . reduce r compute sensitivity charge price
map m . reduce r compute sensitivity add noise charge price
Key Idea Linear type system induce cheapest program
How can we automatically learn relational specifications? [FSE17, best paper award ]
add ( x , y ) = z ⇐ ⇒ add ( y , x ) = z
add i 1 i 2 r 1 2 3 3 4 7 add ( x , y ) = z ⇐ ⇒ add ( y , x ) = z 5 6 11 4 3 7 . . . . . . . . .
add i 1 i 2 r 1 2 3 3 4 7 add ( x , y ) = z ⇐ ⇒ add ( y , x ) = z 5 6 11 4 3 7 . . . . . . . . . learn constraints consistent with Unsupervised learning observations
Exploratory evaluation Applied technique to learn specifications of Python APIs Used ~1000 randomly sampled inputs per function Strings concat ( y , reverse ( y )) = x ⇒ reverse ( x ) = x Z3 valid ( x ) = p ∧ valid ( y ) = p ⇒ valid ( and ( x , y )) = p Trig x = y − π /2 ⇒ ( sin ( x ) = z ⇐ ⇒ cos ( y ) = z )
Other directions Synthesis of Datalog programs—graph analytics Synthesis of fair decision-making programs Active-learning-based user interaction Proofs as programs …
Recommend
More recommend