Towards Automa-cally Se3ng Language Bias in Rela-onal Learning Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak Informa-on and Data Management and Analy-cs (IDEA) Lab
Design a drug to treat HIV What is the structure of compounds that have an#-HIV ac-vity? A compound has an#-HIV ac-vity if it has the following substructure: Oracle N O N 2
Rela-onal learning can learn defini-on for an--HIV compound atom Training data: compId atomId atomId element an#-HIV no-an#-HIV c1 a1 a1 N compId compId c2 a10 a2 O c1 c2 bond c3 c4 atomId1 atomId2 type a1 a2 single a2 a3 single an--HIV(x) :- compound(x,u), atom(u,N), compound(x,v), atom(v,O), Rela-onal learning compound(x,w), atom(w,N), algorithm bond(u,v,single), bond(v,w,single). 3
Benefits of rela-onal learning ü Leverage the structure of compound atom data and learn over complex compId atomId atomId element schemas with mul-ple tables c1 a1 a1 N c2 a10 a2 O ü Automa-c feature extrac-on and selec-on bond atomId1 atomId2 type ü Results are interpretable a1 a2 single (Datalog) a2 a3 single an--HIV(x) :- compound(x,u), atom(u,N), compound(x,v), atom(v,O), Rela-onal learning compound(x,w), atom(w,N), algorithm bond(u,v,single), bond(v,w,single). 4
How rela-onal learning works What is the defini-on of the advisedBy rela-on? paperAuthor professor student advisedBy paperId authorId id posi-on id phase year studId profId p1 f1 f1 faculty s1 post_quals 3 s1 f1 p1 s1 f2 faculty s2 pre_quals 2 s3 f3 p2 s3 f3 adjunct s3 post_prelims 5 not-advisedBy p2 f3 studId profId … s2 f3 s1 f3 Rela-onal learning ? algorithm 5
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- true. 6
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- true. 7
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor paperId authorId student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x). 8
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x). 9
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student id phase year Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x), paperAuthor(z,y). 10
Generic rela-onal learning algorithm advisedBy(x,y) :- professor id posi-on f=1 f=0 f=-1 paperAuthor(z,x) professor(y,z) paperAuthor f=1 f=2 f=0 paperId authorId student(x,v,w) paperAuthor(z,y) student f=2 f=1 f=1 id phase year No improvement Scoring func-on f : P - N P: posi-ve examples covered N: nega-ve examples covered advisedBy(x,y) :- paperAuthor(z,x), paperAuthor(z,y). 11
Learned defini-on What is the defini-on of the advisedBy rela-on? paperAuthor professor student advisedBy paperId authorId id posi-on id phase year studId profId p1 f1 f1 faculty s1 post_quals 3 s1 f1 p1 s1 f2 faculty s2 pre_quals 2 s3 f3 p2 s3 f3 adjunct s3 post_prelims 5 not-advisedBy p2 f3 studId profId … s2 f3 s1 f3 Rela-onal learning advisedBy(x,y) :- algorithm paperAuthor(z,x), paperAuthor(z,y). 12
Hypothesis space in rela-onal learning algorithms is huge • Hypothesis space: all Datalog defini-ons containing rela-ons in the schema • Current solu-on: users must set language bias to restrict the hypothesis space professor advisedBy(x,y) :- id posi-on … paperAuthor paperAuthor(x,x) professor(x,z) paperId authorId paperAuthor(z,x) professor(x,y) paperAuthor(z,y) student(x,v,w) paperAuthor(x,y) student(x,y,z) student paperAuthor(z,v) … id phase year 13
Syntac-c bias restricts the structure of learned Datalog defini-ons • Which rela-ons to query? • Which rela-ons to join and over which agributes? • Should an agribute be a constant or a variable? join paperId with professor id? professor id posi-on advisedBy(x,y) :- paperAuthor(z,x), professor(z,v). paperAuthor advisedBy(x,y) :- paperId authorId professor(y,z), professor(y,faculty). student constant variable id phase year 14
Predicate defini-ons • Assign types to each agribute in every rela-on • Only agributes with same type can join professor a;ribute type id posi-on professor[id] professor professor[posi-on] posi-on paperAuthor paperAuthor[paperId] paper paperId authorId paperAuthor[authorId] student paperAuthor[authorId] professor student student[id] student id phase year … 15
Predicate defini-ons • Assign types to each agribute in every rela-on • Only agributes with same type can join input to the algorithm a;ribute type professor(professor,posi-on) professor[id] professor paperAuthor(paper,student) professor[posi-on] posi-on paperAuthor(paper,professor) paperAuthor[paperId] paper student(student,phase,year) … paperAuthor[authorId] student paperAuthor[authorId] professor student[id] student advisedBy(x,y) :- … paperAuthor(z,x), professor(z,v). 16
Mode defini-ons • Define the mode to call rela-ons and create literals • Each agribute can be: – an exis-ng variable (+) – an exis-ng or new variable (-) – a constant (#) input to the algorithm professor id posi-on professor(+,-) paperAuthor professor(-,+) professor(+,#) paperId authorId … student id phase year 17
Predicate and mode defini-ons are the “black magic” of rela-onal learning • All rela-onal learning algorithms require syntac-c bias • Manually wrigen by the user Rewrite Learn Evaluate Difficult and Requires exper-se Trial-and-error -me-consuming 18
Many lines of code to specify defini-ons movies(+movieid,--tle,-year) movies2composers(+movieid,-composer) cer-ficates(+movieid,#country,#cer-ficate) movies2genres(+movieid,-genreid) movies2composers(-movieid,+composer) countries(+countryid,-country) movies2prodcompanies(+movieid,- composers(+composer,-name) countries(+countryid,#country) prodcompanyid) movies2costdes(+movieid,-costdes) running-mes(+movieid,--me) movies2colors(+,movieid,-colorid) movies2costdes(-movieid,+costdes) running-mes(+movieid,#-me) movies2directors(+movieid,-director) costdesigners(+costdes,-name) aka-tles(+movieid,-languageid,--tle) movies2directors(-movieid,+director) movies2editors(+movieid,-editor) akanames(+name,-name) movies2producers(+movieid,-producer) movies2editors(-movieid,+editor) altversions(+movieid,-text) movies2producers(-movieid,+producer) editors(+editor,-name) business(+movieid,-text) producers(+producer,-name) movies2misc(+movieid,-misc) plots(+movieid,-text) directors(+director,-name) misc(+misc,-name) biographies(+bio,-name,-text) colorinfo(+colorid,-color) movies2proddes(+movieid,-proddes) distributors(+movieid,-name) colorinfo(+colorid,#color) movies2proddes(-movieid,+proddes) mpaara-ngs(+movieid,-text) movies2writers(+movieid,-writer) proddesigners(+proddes,-name) mpaara-ngs(+movieid,#text) movies2writers(-movieid,+writer) genres(+genreid,-genre) releasedates(+movieid,-countryid,-date) writers(+writer,-name) genres(+genreid,#genre) releasedates(+movieid,-countryid,#date) movies2actors(+movieid,-actor,-character) prodcompanies(+prodcompanyid,- technical(+movieid,-text) actors(+actor,-name,-sex) prodcompany) technical(+movieid,#text) actors(+actor,-name,#sex) ra-ngs(+movieid,-rank,-votes) language(+languageid,-language) movies2cinematgrs(+movieid,-cinemat) cer-ficates(+movieid,-country,-cer-ficate) language(+languageid,#language) movies2cinematgrs(-movieid,+cinemat) cer-ficates(+movieid,#country,-cer-ficate) movies2languages(+movieid,-languageid) cinematgrs(+cinemat,-name) cer-ficates(+movieid,-country,#cer-ficate) movies2countries(+movieid,-countryid) 19
AutoMode: automa-cally induce syntac-c bias • Leverage informa-on in the schema and content of the database AutoMode Exact IND Discovery Predicate and mode Approximate defini-ons IND Discovery Rela-onal learning algorithm 20
AutoMode: generate predicate defini-ons • Use inclusion dependencies (referen-al integrity constraints) to find types of agributes • Key idea: the most frequently used joins are the ones over the agributes that par-cipate in an IND – E.g., primary-key to foreign-key rela-onship professor taughtBy id posi-on courseId profId term f1 faculty c1 f1 Fall16 f2 faculty c2 f2 Fall16 f3 adjunct taughtBy[profId] professor[id] ⊆ 21
Recommend
More recommend