Slider: an Efficient Incremental Reasoner Jules Chevalier jules.chevalier@univ-st-etienne.fr Laboratoire Hubert Curien, Télécom Saint Etienne, Université Jean Monnet March 2015 Supervisors : Fréférique Laforest Christophe Gravier Julien Subercaze
Summary Introduction State of the art Contribution Experimental results Conclusion 2 / 28
Semantic Web ◮ Formalises concepts to represent them ◮ Standardizes this representation ◮ Makes it readable for both humans and computers ◮ Links these data together ◮ Allows automatic operations on these data ◮ Integrity constraint validation ◮ Query the knowledge base ◮ Extraction of implicit data 3 / 28
Semantic Web ◮ Formalises concepts to represent them ◮ Standardizes this representation ◮ Makes it readable for both humans and computers ◮ Links these data together ◮ Allows automatic operations on these data ◮ Integrity constraint validation ◮ Query the knowledge base ◮ Extraction of implicit data = Reasoning 4 / 28
Reasoning : Forward Chaining VS Backward Chaining ◮ What we know : ◮ Abraham father Homer ◮ Homer father Liza ◮ Homer father Bart ◮ Marge mother Liza ◮ Marge mother bart Abraham Homer Marge Bart Liza 5 / 28
Reasoning : Forward Chaining VS Backward Chaining ◮ What we know : ◮ Abraham father Homer ◮ Homer father Liza ◮ Homer father Bart ◮ Marge mother Liza ◮ Marge mother bart Abraham ◮ What Forward Chaining do : ◮ Abraham grandfather Liza Homer Marge ◮ Abraham grandfather Bart ◮ ... Bart Liza ◮ Abraham grandfather Liza ? → yes 5 / 28
Reasoning : Forward Chaining VS Backward Chaining ◮ What we know : ◮ Abraham father Homer ◮ Homer father Liza ◮ Homer father Bart ◮ Marge mother Liza ◮ Marge mother bart Abraham ◮ What Forward Chaining do : ◮ Abraham grandfather Liza Homer Marge ◮ Abraham grandfather Bart ◮ ... Bart Liza ◮ Abraham grandfather Liza ? → yes ◮ What Backward Chaining do : ◮ Abraham grandfather Liza ? ◮ Abraham father X & X father Liza ? ◮ Abraham father Homer & Homer father Liza → yes 5 / 28
Rule-based Reasoning Rules ◮ An antecedent : Allows the rule to be executed ◮ A consequent : The statement inferred c 1 subClassOf c 2 , x type c 1 (cax-sco) x type c 2 Fragments ◮ A fragment is a set of inference rules ◮ Semantic Web standards suggest different pre defined fragments (RDFS, OWL Lite, OWL Full, OWL DL, ...) ◮ The more they have a high expressivity, the more the operations are complex (from P to NEXPTIME) ◮ Choosing one fragment is trade off between expressivity and computational complexity 6 / 28
Reasoning kinds Classical Streaming Incremental Reasoning Reasoning Reasoning 7 / 28
Problematic What we want to do ◮ Efficient and scalable incremental forward-chaining reasoning 8 / 28
Problematic What we want to do ◮ Efficient and scalable incremental forward-chaining reasoning What are the problems ◮ Rules form a cyclic graph ◮ Complexity depends on the fragment ! ◮ The amount of triples generated is quite unpredictable ◮ The complexity also depends on data ! ◮ Big Data is not static ◮ We need to handle data streams ! 8 / 28
Summary Introduction State of the art Contribution Experimental results Conclusion 9 / 28
Batch reasoning approaches WebPie : a Web-scale Parallel Inference Engine ◮ 2009 - Jacopo Urbani Thesis [7] ◮ Uses MapReduce for OWL Horst and RDFS reasoning ◮ 2011 - Fix some issues to improve OWL Horst reasoning [8] ◮ Duplicates limitation ◮ Indexation for sameAs ◮ Greedy scheduling ◮ Cleaner Job after some rules, or at the end MapResolve [6] ◮ Based on WebPie to provide EL + classification ◮ Use 3 sets for triples : usable, used, inferred ◮ Limits overheads, optimise ◮ Points out MapReduce limitations 10 / 28
Analysis : MapReduce approaches MapReduce WebPie and MapResolve Framework Contributions ◮ Allows to implement distributed ◮ Only provide batch reasoning tasks ◮ Nodes must wait for each other ◮ The Hadoop framework ◮ Generate a lot of duplicates ◮ Best suited to batch process huge amounts of data ◮ Fragment dependant ◮ Naive partitioning ◮ MapReduce requires an acyclic ◮ Critical letter for WebPie [5] dataflow ◮ Jobs run in isolation ◮ Not suitable network shuffling ◮ Hadoop distributed file system 11 / 28
Incremental solutions History Matters: Incremental Ontology Reasoning Using Modules [3] ◮ Maintains classification of ontologies as they evolve ◮ Provides encouraging results ◮ Not viable for static hierarchy of ontologies ◮ Not adapted on high number of nominals Incremental Reasoning in OWL EL without Bookkeeping [4] ◮ Handles both addition and deletion of knowledge ◮ Incremental classification of TBox ◮ Limited to the classification on the TBox ◮ Dedicated to the EL + fragment 12 / 28
Summary Introduction State of the art Contribution Experimental results Conclusion 13 / 28
Proposed solution Slider ◮ Parallel and Scalable Execution ◮ Rules mapped to independent modules ◮ Multiple rule instances allowed to run in parallel ◮ Duplicates Limitation ◮ Shared triple store ◮ Vertical partitioning [1] and multiple indexing ◮ Data Stream Support ◮ Streamed architecture ◮ Parallel parsing/reasoning ◮ Fragment’s Customization ◮ Dynamic support of ruleset ◮ ρ df and RDFS natively supported ◮ Extendible to any other fragment 14 / 28
Architecture Input Manager Rules Bu ff ers Thread Pool Distributors Rule Modules R 1 R 3 R 1 R 3 R 2 Distributor R 1 Bu ff er R 1 R 2 R 3 Incoming Input R 3 R 1 triples R 2 Manager R 2 R 2 Distributor R 2 Bu ff er R 2 Evolving Data R 3 R 1 R 1 R 2 New R 1 R 2 triples Distributor R 3 Bu ff er R 3 T RIPLE S TORE Explicit Triples Implicit Triples Streamed Triples Concurrent Access 15 / 28
Architecture Input Manager Thread Pool ◮ Manages a pool instances ◮ Receives incoming triples ◮ Ensures scalability ◮ Sends them to ◮ The triple store ◮ The rules buffers Rule instance ◮ Execute the inference ◮ Access concurrently the Rules Buffers triple store ◮ A buffer for each rule ◮ Run the rule when full Distributor ◮ Run the rule when ◮ Stores inferred triples timed-out ◮ Dispatches them to the ◮ Ensures completeness buffers 16 / 28
Inference: cax-sco 17 / 28
Triple Store Concurrent Access Vertical Partitioning ◮ ReentrantReadWriteLock s T RIPLES E NCODING ensure concurrency 2 1 ◮ Write lock to add triples (1,2,3) 3 4 (4,2,5) ◮ Read lock for other methods 5 (6,7,8) 7 (6,7,9) 6 8 Duplicates Elimination 9 ◮ HashMap of MultiMap s ∗ ◮ Bans duplicates Near-optimal indexing ◮ Ensures uniqueness of triples ◮ Indexing by predicates, subjects and objects ◮ Best trade-off for nearly all rules from the OWL fragments ∗ Google’s Guava libraries 18 / 28
Rules Dependency Graph ◮ Directed graph ◮ Created at initialisation time ◮ Edges represent rules ◮ Used to route new triples by ◮ The input manager ◮ A → B : B can use the output ◮ The distributors of A PRP- Universal Input SPO1 PRP- PRP- SCM- SCM- DOM RNG SCO SPO CAX- SCM- SCM- SCO DOM2 RNG2 Rules Dependency Graph for ρ df 19 / 28
Architecture Input Manager Rules Bu ff ers Thread Pool Distributors Rule Modules R 1 R 3 R 1 R 3 R 2 Distributor R 1 Bu ff er R 1 R 2 R 3 Incoming Input R 3 R 1 triples R 2 Manager R 2 R 2 Distributor R 2 Bu ff er R 2 Evolving Data R 3 R 1 R 1 R 2 New R 1 R 2 triples Distributor R 3 Bu ff er R 3 T RIPLE S TORE Explicit Triples Implicit Triples Streamed Triples Concurrent Access 20 / 28
Summary Introduction State of the art Contribution Experimental results Conclusion 21 / 28
Experimentations Baseline ◮ OWLIM-SE (Standard Edition) ◮ Semantic repository with reasoning features ◮ Fastest reasoner available to the best of our knowledge ◮ Outperforms Jena and Sesame ◮ Natively supports RDFS, custom rule configuration for ρ df Dataset ◮ 13 ontologies from 3 sets: ◮ 2 Real life ontologies: WordNet and Wikipedia ◮ 5 generated by BSBM, from 100,000 to 5 million triples ◮ 6 subClassOf ontologies (closure computation, duplicates intensive) 22 / 28
Recommend
More recommend