Thinking Machine Learning Sriraam Amir Martin Babak Natarajan Globerson Mladenov Ahmadi U. Indiana HUJI TUD, Google PicoEgo Kristian Martin Christopher Pavel Grohe and many Re Kersting Tokmakov RWTH more … Stanford INRIA Aachen Grenoble
Take-away message Statistical Machine Learning (ML) needs a crossover with data and programming abstractions Automated Next Next High-level Generation reduction of Generation Data languages computational Machine Science Learning costs • ML high-level languages increase the number of people who can successfully build ML applications and make experts more effective • To deal with the computational complexity, we need ways to automatically reduce the solver costs Kristian Kersting - Thinking Machine Learning
Arms race to deeply understand data Kristian Kersting - Thinking Machine Learning
Bottom line: Take your data spreadsheet … Features Objects Kristian Kersting - Thinking Machine Learning
… and apply machine learning Graphical models Features Gaussian Processes Objects Graph Mining Is it really that simple? teaches Big Small Model Model Distillation/LUPI Boosting f ( t ) F ( t ) t Diffusion Models Big Data Matrix Factorization Autoencoder, and many more … Deep Learning Kristian Kersting - Thinking Machine Learning
Complex data networks abound [Lu, Krishna, Bernstein, Fei-Fei „Visual Relationship Detection“ CVPR 2016] Kristian Kersting - Thinking Machine Learning
Complex data networks abound [Bratzadeh 2016; Bratzadeh, Molina, Kersting „The Machine Learning Genome“ 2017] The ML Genome is a dataset, a knowledge base, an ongoing effort to learn and reason about ML concepts Algorithms Compared to Actually, most data in the world stored in relational databases
De Raedt, Kersting, Natarajan, Poole, Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan and Claypool Publishers, ISBN: 9781627058414, 2016. Punshline: Two trends that drive ML 1. Arms race to deeply understand data 2. Data networks of a large number of formats It costs considerable human effort to develop, for a given dataset and task, a good ML algorithm Crossover of ML with data & Uncertainty Scaling programming abstractions make the ML expert more effective Databases/ Data increases the number of people who Logic Mining can successfully build ML applications And this had major impact on CogSci Lake et al., Science 350 (6266), 1332-1338, 2015 Tenenbaum, et al., Science 331 (6022), 1279-1285, 2011 Kristian Kersting - Thinking Machine Learning
[Ré, Sadeghian, Shan, Shin, Wang, Wu, Zhang IEEE Data Eng. Bull.’14; Natarajan, Picado, Khot, Kersting, Ré, Shavlik ILP’14; Natarajan, Soni, Wazalwar, Viswanathan, Kersting Solving Large Scale Learning Tasks’16, Mladenov, Heinrich, Kleinhans, Gonsior, Kersting DeLBP’16, …] Thinking Machine Learning Feature Declarative Learning Symbolic-Numerical Extraction Programming Solver (Un-)Structured Inference Data Sources Machine Learning Database Results (data, weighted rules, loops and data structures) DomainKnowledge Model Rules and Graph Kernels p Representation Diffusion Processes Random Walks Algorithms DM and ML Decision Trees Learning Frequent Itemsets 0.9 SVMs Graphical Models Topic Models 0.6 Gaussian Processes External Databases Autoencoder Matrix and Tensor Factorization Reinforcement Learning … Features and Data Rules Features Feedback/AutoDM and Rules Kristian Kersting - Thinking Machine Learning
This connects the CS communities Jim Gray Turing Award 1998 Mike Stonebraker Turing Award 2014 “Automated Programming” “One size does not fit all” Data Mining/Machine Learning, Databases, AI, Model Checking, Software Engineering, Kristian Kersting - Declarative Data Science Programming Optimization, Knowledge Representation, Constraint Programming, … !
However, machines that think and learn also complicate/enlarge the underlying computational models, making them potentially very slow CAN THE MACHINE HELP TO REDUCE THE COSTS? Kristian Kersting - Thinking Machine Learning
Guy van den Broeck UCLA LIKO81, CCA 3.0
Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) … … … card card card (52,d2) (52,d3) (52,pAce) LIKO81, CCA 3.0
Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) … … … card card card (52,d2) (52,d3) (52,pAce) LIKO81, CCA 3.0
Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) No independencies. Fully connected. … … 2 2704 states … card card card (52,d2) (52,d3) (52,pAce) LIKO81, CCA 3.0
Guy van den Broeck UCLA card card card … (1,d2) (1,d3) (1,pAce) A machine will not solve … … the problem … card card card (52,d2) (52,d3) (52,pAce) LIKO81, CCA 3.0
Faster modelling Faster ML
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] What are symmetries in approximate Lifted Loopy Belief Propagation probabilistic inference, one of the Exploiting computational symmetries working horses of ML? automatically Big compressed Model Small Model Run a modified Run Loopy Belief Propagation Loopy Belief Propagation Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Coloring the graph § Color nodes according to the evidence you have § No evidence, say red § State „one“, say brown § State „two“, say orange § ... § Color factors distinctively according to their equivalences For instance, assuming f 1 and f 2 to be identical and B appears at the second position within both, say blue Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Pass the colors around 1. Each factor collects the colors of its neighboring nodes Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Pass the colors around 1. Each factor collects the colors of its neighboring nodes 2. Each factor „signs“ its color signature with its own color Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Pass the colors around 1. Each factor collects the colors of its neighboring nodes 2. Each factor „signs“ its color signature with its own color 3. Each node collects the signatures of its neighboring factors Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Pass the colors around 1. Each factor collects the colors of its neighboring nodes 2. Each factor „signs“ its color signature with its own color 3. Each node collects the signatures of its neighboring factors 4. Nodes are recolored according to the collected signatures Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression: Pass the colors around 1. Each factor collects the colors of its neighboring nodes 2. Each factor „signs“ its color signature with its own color 3. Each node collects the signatures of its neighboring factors 4. Nodes are recolored according to the collected signatures 5. If no new color is created stop, otherwise go back to 1 Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Lifted Loopy Belief Propagation Exploiting computational symmetries quasi-linear time automatically Big compressed Model Small A,C Model f 1, f 2 B Run a modified Run Loopy Belief Propagation Loopy Belief Propagation Kristian Kersting - Thinking Machine Learning
[Singla, Domingos AAAI’08; Kersting, Ahmadi, Natarajan UAI’09; Ahmadi, Kersting, Mladenov, Natarajan MLJ’13] Compression can considerably speed up inference and training Probabilistic inference using lifted (loopy) belief propagation The lower, the better 114x faster Parameter training using a lifted stochastic gradient CORA entity resolution converges before data has been seen once What is going on The higher, the better 100x algebraically? faster Can we generalize this to other ML State-of-the-art approaches? Kristian Kersting - Thinking Machine Learning
Instead of looking at ML through the glasses of probabilities, let‘s approach it using optimization WL computes a fractional automorphism of some matrix A X Q A = AX P where X Q and X p are doubly-stochastic matrixes (relaxed form of automorphism) It turns out that color passing is well- known in graph theory: The Weisfeile-Lehman Algorithm
Recommend
More recommend