October 18, 2016 Tabled CLP for Reasoning over Stream Data Joaquín Arias 1 , 2 1 IMDEA Software Institute, 2 Technical University of Madrid madrid institute for advanced studies in software development technologies
www.software.imdea.org Goal: Change the Way Stream Data is Analyzed Stream data is a continuous flow of data. The stream data analysis should be updated dynamically because the data is changing. Every day there are more sensors collecting information and we want to analyze this information to make decisions. We propose a high level language rooted in logic and constraint, to write the programs which analyze the data: • The language will make it easier to maintenance the programs. • The constraint will prune the search space in early stages. • It will make possible to reuse previous results to update the analysis. madrid institute for advanced studies in software development technologies 1 / 18
www.software.imdea.org Use Case: Fast Flower Delivery A consortium of flower stores use independent van drivers to deliver the flowers. madrid institute for advanced studies in software development technologies 2 / 18
www.software.imdea.org Use Case: Fast Flower Delivery A consortium of flower stores use independent van drivers to deliver the flowers. • Broadcast the delivery request to the drives which satisfy a location and ranking. • Collect the driver’s bids and assigns the delivery based on the shop requirements. • Control the delivery process and if it is the case generate alerts. • Evaluate each driver’s ranking. madrid institute for advanced studies in software development technologies 2 / 18
www.software.imdea.org 1 - Easy-to-maintain Programs: Prolog In Stream Data analysis not only the data changes but also the requirements of the problem. Therefore the programs have to be modified. Most of the programs are written combining computational and query languages. As a result, the bottleneck is on the human side rather than machine side. Using logic programming the problems are expressed in a more natural way: • Etalis [2] (by Darko Anicic et al.) Instead of 150.000 lines of relational database code, uses 2.500 lines of code. • DeALS [16] (developed in UCLA) • LogiQL [8] (by LogicBlox) • Yedalog [6] (by Google) 70% fewer lines of code than with C++. madrid institute for advanced studies in software development technologies 3 / 18
www.software.imdea.org 1 - Easy-to-maintain Programs: Prolog ce1 ( Result ) < − e (Name, Result ) SEQ e (Name, Result ) WHERE (Name = "a " , Result = 1 ) . ce2 ( Result ) < − ce1 ( Result ) AND ce1 ( Result ) WHERE ( Result = 1 ) . −−−−−−−−−−−−−−−−−−−−−−−− <Query name= " ce1 " t e x t =" i n s e r t i n t o tmpE(ceName, Result ) select " ce1 " as ceName, e1 . Result as Result from pattern [ every ( + e1=e ( e1 .Name="a" and e1 . Result =1) − > e2=e ( e2 .Name="a" and e2 . Result =1) ) ] " / > <Query name= " ce2 " t e x t =" select " ce2 " as Name, e1 . Result as Result from pattern [ every ( + e1=tmpE( e1 .ceName="ce1 " and e1 . Result =1) and e2=tmpE( e2 .ceName="ce1 " and e2 . Result =1) ) ] " / > Figure: Two versions of events detection rules written in: ETALIS, a logic programming language (above) vs ESPER, a relational database language (below) madrid institute for advanced studies in software development technologies 4 / 18
www.software.imdea.org 2 - Heterogeneous Data Sources: RDF Stream Equivalent data (e.g. GPS position) may are generated by different sources which could provide extra information. RDF represents the data as labeled directed edges (triple). RDF_ Stream annotates them with a time reference �� Subject, Predicate, Object � , Time � . Using RDF / RDF_Stream the data model is independent from the source so adapting the model is easier when the requirements changes. : − rdf_register_ns ( ffd , ’ http : / / f a s t _ f l o w e r _ d e li v e r y .com / ’ ) . Area , Rank) : − delivery_ranking_position ( Delivery , rdf_s (Shop , f f d : request , Delivery , _Time ) , r d f (Shop , f f d : area , Area ) , r d f (Shop , f f d : ranking , Rank ) . Figure: Join query written in Prolog combining RDF and RDF_Stream. madrid institute for advanced studies in software development technologies 5 / 18
www.software.imdea.org 3 - Background Knowledge: Ontology Domains (OWL) A shop delivery request is contextualized with the position of the shop. Defining a delivery hierarchy: ( ⊑ ) � premium_delivery, type, delivery � • We do not have to duplicate the code because premium_delivery is a delivery . • Additionally we can add specific rules for premium_delivery . The Ontology Web Language (OWL) can be represented using RDF. The OWL can be used to define concept hierarchies and predicate properties. A common representation for stream data and background knowledge makes it easier to write, maintain and extend the programs. There are several RDF-APIs and ontology reasoners in Prolog, like F-OWL [18]. madrid institute for advanced studies in software development technologies 6 / 18
www.software.imdea.org 4 - Define Event Relationships: Constraints (CLP) To increase the rank of a driver we have to check that he picked up and delivered the flowers on time. �� D, on_time, F � , T2 � ← T2 #< T1 + 10, �� D, pickup, F � , T1 � , �� D, delivery, F � , T2 � . NOTE: The rule should be fired also if �� D, delivery, F � , T2 � arrives before to the system. Most of the event relationships are time-based and we have to deal with Out-Of-Order data arrival making this problem even more complex. When a event relationship is detected, the system generates a new events which may will be used in several rules. Constraint will make it possible to define more complex relationships. madrid institute for advanced studies in software development technologies 7 / 18
www.software.imdea.org 5 - Reuse Answers: Tabling (Answer on Demand) The top-down execution of Prolog reduces the search tree but can enter loops where the bottom-up execution of Datalog terminates. Tabling solves this drawback and make it possible to reuse previous answers. Many tabling implementations use local scheduling which try to find the answers to a query (to reach the fix point) before returning them. Due to the unbounded nature of data streams the tabling engine should: • Discard repeated and redundant answers [15, 4, 3]. • Return answers on demand [7, 13, 5]. • Remove obsolete answers (a kind of non-monotonicity). E.g. due to expiration timestamp. madrid institute for advanced studies in software development technologies 8 / 18
www.software.imdea.org 6 - Aggregate Rules: New Semantics If we want to know the number of deliveries requested by each shop during the last hour, we does not need to store details of delivery requests. Aggregates (e.g. count ) are meta-predicates and reduce the volume of data that we have to store. Some research has been done (see [9, 12, 17]) but it is still not clear the correct semantics of aggregates in recursive Prolog program under tabled execution. :- aggregate p(min). Neither p(0) nor p(1) are least Herbrand models consistent with the intended se- p(1). mantics of the program. p(0) :- p(1). madrid institute for advanced studies in software development technologies 9 / 18
www.software.imdea.org 7 - Incremental Evaluation: Incremental TCLP Assume we choose the vans based on its time-distance. When an accident is reported their time-distance should be recalculate. Since does not all the vans are affected incremental strategies [10] can be used. Figure: Sliding time window from time t to t+1 . Example from [11]. Incremental tabling [14] performs dynamic updates of the tabled results taking into account the dependency structure in order to remove the results inferred using an expired fact. The presence of constraints makes it more complex. madrid institute for advanced studies in software development technologies 10 / 18
www.software.imdea.org Our Proposal: RDF Prolog Stream a t a d Language s u o e n e g o r e t e Background knowledge Incremental H OWL Incremental evaluation TLCP Stream Event relationships TCLP Aggregates Reuse answers CLP New semantics Tabling AoD madrid institute for advanced studies in software development technologies 11 / 18
www.software.imdea.org Preliminary Results: TCLP TCLP facilitates the integration of CLP solvers with the tabling engine in Ciao Prolog. We validate its advantages versus Prolog, CLP and tabling with respect to: • Declarativeness and logical reading. • Termination properties. • Performance. Example : Find nodes in a weighted graph within a distance K from each other. It is a typical query for the analysis of social networks [15]. madrid institute for advanced studies in software development technologies 12 / 18
Recommend
More recommend