Random Walk Inference and Learning in A Large Scale Knowledge Base - PowerPoint PPT Presentation

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base Ni Lao Tom Mitchell William W Cohen Ni Lao, Tom Mitchell, William W. Cohen Carnegie Mellon University 2011.7.28 EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 1

Outline Outline Motivation • – Inference in Knowledge ‐ Bases – The NELL project – Random Walk Inference • Approach – Path Ranking Algorithm (Recap) – Data ‐ Driven Path Finding Data Driven Path Finding – Efficient Random Walk (Recap) – Low ‐ Variance Sampling • Results – Cross Validation – Mechanical Turk Evaluation Mechanical Turk Evaluation EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 2

Large Scale Knowledge ‐ Bases Large Scale Knowledge Bases • Human knowledge is being transformed into structured data at a fast speed e g speed, e.g. – KnowItAll (Univ. Washington) • 0.5B facts extracted from 0.1B web pages – DBpedia (Univ. Leipzig) 3.5M entities 0.7B facts extracted from wikipedia • – YAGO (Max ‐ Planck Institute) • 2M entities 20M facts extracted from Wikipedia and wordNet – FreeBase • 20M entities 0.3B links, integrated from different data sources and human judgments – NELL (Carnegie Mellon Univ.) • 0.85M facts extracted from 0.5B webpages EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 3

The Need for Robust and Efficient Inference f • Knowledge is potentially useful in many tasks – Support information retrieval/recommendation – Bootstrap information extraction/integration • Challenges – Robustness: extracted knowledge is incomplete and noisy – Scalability: the size of knowledge base can be very large Scalability: the size of knowledge base can be very large TeamPlays AthletePlays Steelers InLeague ForTeam AthletePlaysInLeague AthletePlaysInLeague HinesWard NFL ? IsA PlaysIn American American isa -1 EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 4

The NELL Case Study The NELL Case Study Never ‐ Ending Language Learning: • – “a never ‐ ending learning system that operates 24 hours per day, for years, to continuously improve its ability to read (extract structured facts from) the web” (Carlson et al., 2010) – Closed domain, semi ‐ supervised extraction – Combines multiple strategies: morphological patterns, textual context, html patterns, logical inference – Example beliefs EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 5

A Link Prediction Task A Link Prediction Task • We consider 48 relations for which NELL database has more than 100 instances • We create two link prediction tasks for each relation W li k di i k f h l i – AthletePlaysInLeague(HinesWard,?) – AthletePlaysInLeague(? NFL) AthletePlaysInLeague(?, NFL) • The actual nodes y known to satisfy R(x; ?) are treated as labeled positive examples, and all other nodes are treated as negative examples EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 6

First Order Inductive Learner First Order Inductive Learner FOIL (Quinlan and Cameron ‐ Jones 1993) is a learning algorithm FOIL (Quinlan and Cameron Jones, 1993) is a learning algorithm • similar to decision trees, but in relational domains • NELL implements two assumptions for efficient learning (N ‐ FOIL) – The predicates are functional ‐‐ e.g. an athlete plays in at most one league – Only find clauses that correspond to bounded ‐ length paths of binary relations ‐‐ relational pathfinding (Richards & Mooney, 1992) EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 7

First Order Inductive Learner First Order Inductive Learner • Efficiency – Horn clauses can be very costly to evaluate – E.g. it take days to train N ‐ FOIL on the NELL data • Robustness – FOIL can only combine rules with disjunctions, therefore cannot leverage low accuracy rules g y – E.g. rules for teamPlaysSports EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 8

Random Walk Inference Random Walk Inference Consider a low precision/high recall Horn clause • – isa(x, c) ^ isa(x’,c)^ AthletePlaysInLeague(x’, y) � AthletePlaysInLeague(x; y) • • A Path Constrained Random Walk following the above edge type A Path Constrained Random Walk following the above edge type sequence generates a distribution over all leagues AthletePlays isa -1 InLeague i isa athlete HinesWard (concept) (concept) all leagues all leagues all athletes Prob(HinesWard � y) can be treated as a relational feature for • predicting AthletePlaysInLeague(HinesWard; y) di i A hl Pl I L (Hi W d ) EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 9

Comparison Comparison Inductive logic programming (e.g. FOIL) • – Brittle facing uncertainty • Statistical relational learning (e.g. Markov logic networks, Relational Bayesian Networks) – Inference is costly when the domain contains many nodes Inference is costly when the domain contains many nodes – Inference is needed at each iteration of optimization • Random walk inference – Decouples feature generation and learning (propositionalization) • No inference needed during optimization – Sampling schemes for efficient random walks S li h f ffi i d lk • Trains in minutes as opposed to days for N ‐ FOIL – Low precision/high recall rules as features with fractional values p / g • Doubles precision at rank 100 compared with N ‐ FOIL EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 10

Path Ranking Algorithm (PRA) Path Ranking Algorithm (PRA) (Lao & Cohen, ECML 2010) A relation path P= ( R 1 , …,R n ) is a sequence of relations • • A PRA A PRA model scores a source ‐ target node pair by a linear function d l t t d i b li f ti of their path features = ∑ ∑ s t θ score s t f f ( , ) ( , ) ( , ) ( , ) P P P P ∈ P P – P is the set of all relation paths with length ≤ L = → → f f s t s t s s t P t P – ( , ) ( , ) Prob( Prob( ; ; ) ) P P Training • – For a relation R and a set of node pairs {( s i , t i )}, i i – we construct a training dataset D ={(x i , y i )}, where – x i is a vector of all the path features for ( s i , t i ), and – y i indicates whether R( s i , t i ) is true or not – θ is estimated using L1,L2 ‐ regularized logistic regression EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 12

Data ‐ Driven Path Finding Data Driven Path Finding Impractical to enumerate all possible paths even for small length l • – Require any path to instantiat e in at least α portion of the training queries, i.e. f P (s,t) ≠ 0 for any t – Require any path to reach at least one target node in the training set Discover paths by a depth first search • – Starts from a set of training queries, expand a node if the instantiation constraint is satisfied EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 13

Data ‐ Driven Path Finding Data Driven Path Finding • Dramatically reduce the number of paths y p l EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 14

Efficient Inference Efficient Inference (Lao & Cohen, KDD 2010) • Exact calculation of random walk distributions results in Exact calculation of random walk distributions results in non ‐ zero probabilities for many internal nodes in the graph • but computation should be focused on the few target • but computation should be focused on the few target nodes which we care about EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 15

Efficient Inference Efficient Inference (Lao & Cohen, KDD 2010) • Sampling approach p g pp – A few random walkers (or particles) are enough to distinguish good target nodes from bad ones EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 16

Low ‐ Variance Sampling Low Variance Sampling • Sampling walkers/particles independently introduces variances to the result distributions • Low ‐ Variance Sampling (LVS)(Thrun et al., 2005) p g ( )( , ) generates M correlated samples, by drawing a single number r from (0,M ‐ 1 ) samples correspond to M ‐ 1 +kr, k=0..M ‐ 1 EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 17

Low Variance Sampling Low Variance Sampling Averaged over 96 tasks 100k • In our evaluation 10k 0.5 10k – LVS can slightly 1k improve prediction for 1k both finger printing MRR M and particle filtering and particle filtering Exact Independent Fingerprinting Low Variance Fingerprinting Low Variance Fingerprinting 100 Independent Filtering Low Variance Filtering 0.4 0 0 1 1 2 2 3 3 4 4 5 5 Random Walk Speedup EMNLP 2011, Edinburgh, Scotland, UK 7/28/2011 18

Random Walk Inference and Learning in A Large Scale Knowledge Base - PowerPoint PPT Presentation

Random Walk Inference and Learning in A Large Scale Knowledge Base in A Large Scale Knowledge Base Ni Lao Tom Mitchell William W Cohen Ni Lao, Tom Mitchell, William W. Cohen Carnegie Mellon University 2011.7.28 EMNLP 2011, Edinburgh, Scotland, UK

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Random Walk Inference and Learning in A Large Scale Knowledge Base Anshul Bawa Adapted from

Mixing time for a random walk on a ring Stephen Connor Joint work with Michael Bate Aspects of

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

Short Walks in Higher Dimensions Ghislain McKay Febuary 3, 2015 What is a Random Walk? A path

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Advanced Algorithms (XII) Shanghai Jiao Tong University Chihao Zhang May 25, 2020 Random Walk

Critical density for Activated Random Walk Lorenzo Taggi Max Planck Institute for Mathematics in

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Random Walks Will Perkins February 5, 2013 Simple Random Walk S 0 = 0, S n = X 1 + X 2 + . . . X

Random walk on the torus Jean-Baptiste Boyer (IMB / ModalX) May 16, 2016 Jean-Baptiste Boyer

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Onelight.com Training Series Connecting the Pyramids and the Crystal Cities the ISIS Walk 2 The

Developing LAr Scintillation Light Applications at Neutrino Energies with LArIAT Andrzej Szelc,

For Friday Read chapter 8, sections 1-2 No homework ILP Homework Due Monday after

Multicast Address-Set Claim (MASC) Stable Storage P a vlin Radosla v o v (USC/ISI)

Stanford CS193p Developing Applications for iOS Fall 2013-14 Stanford CS193p Fall 2013 Today

BiPo Prototype Measurements for SuperNEMO Objectives and principles Prototypes First

On the Circular Security of Bit Encryption Ron Rothblum Weizmann Institute Circular Security

IMMORAL GAIN 9 Woe to him who gets evil gain for his house To put his nest on high, To be

Comparison of Queuing Data Structures for Traffic Analysers Maximilian Pudelko August 3, 2016