declarative data science machines
play

Declarative Data Science Machines Sriraam Amir Martin Babak - PowerPoint PPT Presentation

Declarative Data Science Machines Sriraam Amir Martin Babak Natarajan Globerson Mladenov Ahmadi and many U. Indiana HUJI TUD, Google PicoEgo more Kristian Martin Pavel Christopher Christian Grohe Tokmakov Re Kersting


  1. Declarative Data Science Machines Sriraam Amir Martin Babak Natarajan Globerson Mladenov Ahmadi and many U. Indiana HUJI TUD, Google PicoEgo more … Kristian Martin Pavel Christopher Christian Grohe Tokmakov Re Kersting Bauckhage RWTH INRIA Stanford U. Bonn Aachen Grenoble

  2. What about “-O” flags for Data Science Machines Sriraam Amir Martin Babak Natarajan Globerson Mladenov Ahmadi and many U. Indiana HUJI TUD, Google PicoEgo more … Kristian Martin Pavel Christopher Christian Grohe Tokmakov Re Kersting Bauckhage RWTH INRIA Stanford U. Bonn Aachen Grenoble

  3. Kristian Kersting - Declarative Data Science Machines

  4. Arms race to “deeply” understand data Kristian Kersting - Declarative Data Science Machines

  5. Take your spreadsheet … Features Objects Kristian Kersting - Declarative Data Science Machines

  6. … and apply Machine Learning Big Data Matrix Factorization Latent Dirichlet Allocation Gaussian Processes Features teaches Big Objects Small Model Model Decision Trees/Boosting Distillation/LUPI f ( t ) F ( t ) t Diffusion Models Autoencoder/Deep Learning and many more … Kristian Kersting - Declarative Data Science Machines

  7. Not only Big on data but also on interpretability Plant Phenotyping IS IT REALLY THAT SIMPLE? Kristian Kersting - Declarative Data Science Machines

  8. What is the biological meaning of an eigenvector?

  9. [Thurau, Kersting, Bauckhage, DAMI 2012] Simplex Volume Maximization

  10. [Thurau, Kersting, Bauckhage, DAMI 2012] Simplex Volume Maximization [Römer et al. Functional Plant Biology 2012, Wahabzada et al. PlosOne 2015, Wahabzada et al. Scientific Reports (Nature) 2016; Leuker et al. Functional Plant Biology 2016] Mainly pigments

  11. Several statistics used to characterize graphs: Degree distribution, average path length, diameter, cluster coefficients, ... IS IT REALLY THAT SIMPLE? Kristian Kersting - Declarative Data Science Machines

  12. MY IT‘S A (SMALL) WORLD The (six-)degree of separation S. Milgram K. Bacon Psychology Today 2:60-67, 1967 Kristian Kersting - Declarative Data Science Machines

  13. [Bauckhage, Kersting , Hadiji UAI 2015] MY IT‘S A (SMALL) WORLD The (six-)degree of separation is the mean of a generalized Gamma Distribution R R I I R d = 3 R R d = 2 R I d = 1 R R R d = 0 R R R R R I Proof: SIR-model, multinomial over histogram of distance, use Sterling’s formula to turn into maxent, impose constraints such that polynomial reachability + finite moments

  14. [Bauckhage, Kersting , Hadiji UAI 2015] MY IT‘S A (SMALL) WORLD The (six-)degree of separation is the mean of a generalized Gamma Distribution 10 0 1 . 0 GenGamma data data fit 10 − 1 0 . 8 frequency 10 − 2 frequency 0 . 6 10 − 3 10 − 4 10 1 10 2 10 3 0 . 4 node degree 0 . 2 0 . 0 1 2 3 4 5 6 7 8 9 path length

  15. „The subject of collective attention is central to an information age where millions of people are inundated with daily messages. “ - Wu and Huberman, PNAS, 104(45), 2007

  16. Kristian Kersting - Declarative Data Science Machines

  17. Kristian Kersting - Declarative Data Science Machines

  18. Bauckhage, Kersting, Hadiji ICWSM 2015 Can YouTube videos really go viral? Yes, they are ! Collective attention to YouTube videos follows an epidemic model Closed-from density of two convolved 1 − i 1 − r 1 exponential distributions α − λ � e − λ t + 8 α λ λ − α ↵ e − α t if � 6 = ↵ i r S I R < f ( t ) = � 2 t e − λ t if � = ↵ . : View Counts YoutTube Search Counts Google Trend Kristian Kersting - Declarative Data Science Machines

  19. Unsettledness in Politics and Business Let’s better not say anything. Otherwise, we will have an online firestorm tomorrow

  20. reuters.com/article/2011/11/22/us-qantas-idUSTRE7AL0HB20111122 Just a day after Qantas and its unions broke off contract negotiations and after Qantas grounded its fleet in late October, Qantas invited users to enter a "Qantas Luxury" competition, asking people to describe their "dream luxury in-flight experience” qantasluxury 100 SIR SIR/C ● 80 CF1 − 3 CF1 − 4 Attention 60 40 20 0 0 10 20 30 40 50 Tweets/1h Tweets/8h ● Hours after outbreak Kristian Kersting - Declarative Data Science Machines Kersting, Bauckhage, Köcher, Swazinna, Pfeffer 2016

  21. IS IT REALLY THAT SIMPLE? Kristian Kersting - Declarative Data Science Machines

  22. Actually, most data in the world are stored in relational DBs E.g. mining Electronic Health Records are an opportunities to save our lifes and a lot of money. Unfortunately, they are dirty and interconnected PatientID Date Physician Symptoms Diagnosis Patient Table PatientID Gender Birthdate Visit Table P1 1/1/01 Smith palpitations hypoglycemic P1 M 3/22/63 P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K SNP Table Lab Tests P1 1/1/01 blood glucose 42 P1 AA AB BB P1 1/9/01 blood glucose ?? P2 AB BB AA Prescriptions PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months

  23. [ Kersting , Driessens ICML´08; Karwath, Kersting , Landwehr ICDM´08; Natarajan, Joshi, Tadepelli, Kersting , Shavlik. IJCAI´11; Khot, Natarajan, Kersting , Shavlik ICDM´13, MLJ´12, Springer Brief´15, MLJ´15] Plaque in the left Relational Mining of EHRs coronary artery Atherosclerosis is the cause of the majority of Acute Myocardial Infarctions (heart attacks) Logical Variables (Placeholders) [Circulation; 92(8), 2157-62, 1995; JACC; 43, 842-7, 2004] Probabilities Probabilistic Rule Algo Likelihood AUC-ROC AUC-PR Time Boosting 0.810 0.961 0.930 9s MLN 0.730 0.535 0.621 93 hrs Kristian Kersting - Declarative Data Science Machines

  24. [Lu, Krishna, Bernstein, Fei-Fei „Visual Relationship Detection“ CVPR 2016] Kristian Kersting - Declarative Data Science Machines

  25. [Getoor, Taskar MIT Press ’07; De Raedt, Frasconi, Kersting , Muggleton, LNCS’08; Domingos, Lowd Morgan Claypool ’09; Natarajan, Kersting , Khot, Shavlik Springer Brief’15; Russell CACM 58(7): 88-97 ’15] De Raedt, Kersting , Natarajan, Poole, Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan and Claypool Publishers, 2016. Statistical Relational Learning/AI Mining Probabilistic DBs … … the study and design of Optimization intelligent agents that act in CogSci Uncertainty Scaling noisy worlds composed of objects and relations among IR Logic Mining the objects Graphs And Trees Learning KR CV And this had major impact on CogSci Search DM/ML Lake, Salakhutdinov, Tenenbaum, Science 350 (6266), 1332-1338, 2015 Tenenbaum, Kemp, Griffiths, Goodman, Science 331 (6022), 1279-1285, 2011

  26. [Ré, Sadeghian, Shan, Shin, Wang, Wu, Zhang IEEE Data Eng. Bull.’14; Natarajan, Picado, Khot, Kersting , Ré, Shavlik ILP’14; Natarajan, Soni, Wazalwar, Viswanathan, Kersting Solving Large Scale Learning Tasks’16, Mladenov, Heinrich, Kleinhans, Gonsior, Kersting DeLBP’16, …] Declarative Data Science Machines Feature Symbolic-Numerical Probabilistic Database Extraction Inference (Un-)Structured Weighted Relational/Graph Inference Data Sources The next breakthrough in data Database and Declarative Results Mathematical Program analytics may not be a new data DomainKnowledge Model Rules and Graph Kernels p analysis algorithm… Representation Diffusion Processes Random Walks Algorithms DM and ML Decision Trees Learning Frequent Itemsets 0.9 SVMs Graphical Models Topic Models 0.6 Gaussian Processes External Databases Autoencoder …but may be in the ability to Matrix and Tensor Factorization Reinforcement Learning … rapidly combine, deploy, and Features and Rules maintain existing algorithms Features Feedback and Rules Kristian Kersting - Declarative Data Science Machines

  27. BUT WHAT DOES “RAPIDLY COMBINE AND DEPLOY” MEAN? Kristian Kersting - Declarative Data Science Machines

  28. Guy van den Broeck UCLA

  29. Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) … … … card card card (52,d2) (52,d3) (52,pAce)

  30. Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) … … … card card card (52,d2) (52,d3) (52,pAce)

  31. Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) No independencies. Fully connected. … … 2 2704 states … card card card (52,d2) (52,d3) (52,pAce)

  32. Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) A machine will not solve … … the problem … card card card (52,d2) (52,d3) (52,pAce)

  33. Faster modelling Faster inference and learning

  34. WHAT ARE SYMMETRIES IN APPROXIMATE PROBABILISTIC INFERENCE? Kristian Kersting - Declarative Data Science Machines

Recommend


More recommend