Simon Simon Schiff Schiff, Özgür Özgür L. L. Özçep Özçep Exploiting Exploiting Back-End Back-End APIs APIs fo for Feasible easible Ontology-Based Ontology-Based Stream Stream Access cess Fourth Stream Reasoning Workshop, Linköping, 17 th April Institute of Information Systems University of Lübeck
Q STARQL Q SQL historic data OBDA on huge datasets (w/o optimization) ⇒ Long processing times 2 / 6
Q STARQL Q SQL Q SparkSQL historic historic data data Processing times are reducible OBDA on huge datasets using additional hardware and (w/o optimization) Idea parallelisation ⇒ Long processing times ⇒ Short processing times . . . Horizontal scaling 2 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 Information need for monotonicity Tell every minute whether the temperature measured by a sensor increased monotonically in the last 5 minutes. 3 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 92 . 0 91 . 0 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6
STARQL Query Example Measurements Temp/C ◦ 94 . 0 93 . 0 y2 at t 3 92 . 0 91 . 0 y1 at t 1 90 . 0 Time/min 0 . 0 1 . 0 2 . 0 3 . 0 4 . 0 5 . 0 6 . 0 STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 3 / 6
STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 The FOL template language is domain independent 1 : STARQL HAVING clause can be unfolded into languages such as SQL. ⇒ Process historic (e.g. timestamped datasets) 1 Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc., 1995. 3 / 6
STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ First idea: Create table with window intervals and join with historic dataset. ◮ Too slow? 3 / 6
STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ Second idea: Create a function for executing the unfolded query per window using PL/pgSQL. ◮ Sufficient? 3 / 6
STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 No function exists for executing the unfolded query per window! ◮ Second idea: Create a function for executing the unfolded query per window using PL/pgSQL. ◮ Sufficient? ◮ 3 / 6
STARQL Query Example CREATE TYPE window_state AS (memory measurements [], 1 wid bigint , start timestamp , stop timestamp , pulse timestamp); 2 CREATE TABLE measurements_data AS SELECT NULL :: bigint 3 AS wid , NULL :: timestamp AS timestamp , NULL :: integer AS sensor , NULL :: numeric (12 ,3) AS value WHERE false; 4 CREATE OR REPLACE FUNCTION moving_window (source text , 5 pulse interval , range interval , slide interval) RETURNS SETOF measurements_data AS $$ DECLARE 6 win window_state ; 7 8 cnt bigint; 9 10 line_cursor refcursor; 11 line measurements ; 12 BEGIN 13 OPEN line_cursor FOR EXECUTE ’SELECT * FROM ’ format 14 (’%1$s ’, source) ’ ORDER BY timestamp ASC ’; FETCH line_cursor INTO line; 15 16 win.start := line.timestamp; 17 3 / 6
STARQL Query Example win.stop := line.timestamp + range; 18 win.pulse := line.timestamp; 19 20 WHILE line.timestamp < win.stop LOOP 21 win.memory := array_append (win.memory , line); 22 FETCH line_cursor INTO line; 23 END LOOP; 24 25 win.wid := 0; 26 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 27 measurements [])).*; 28 win.pulse := win.pulse + pulse; 29 WHILE line.timestamp IS NOT NULL LOOP 30 IF win.pulse < win.stop AND win.pulse < win.start + 31 slide THEN win.wid := win.wid + 1; 32 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 33 measurements [])).*; win.pulse := win.pulse + pulse; 34 ELSIF win.pulse >= win.stop AND win.pulse < win. 35 start + slide THEN win.pulse := win.pulse + pulse; 36 37 win.start := win.start + slide; 38 3 / 6
STARQL Query Example win.stop := win.stop + slide; 39 WHILE line.timestamp < win.stop LOOP 40 win.memory := array_append (win.memory , line); 41 FETCH line_cursor INTO line; 42 END LOOP; 43 cnt := 1; 44 FOR i IN coalesce( array_lower (win.memory , 1), 1) .. 45 coalesce( array_upper (win.memory , 1), 1) LOOP IF win.memory[i]. timestamp < win.start THEN 46 cnt := cnt + 1; 47 ELSE 48 EXIT; 49 END IF; 50 END LOOP; 51 win.memory := win.memory[cnt :]; 52 ELSIF win.pulse >= win.start + slide THEN 53 win.start := win.start + slide; 54 win.stop := win.stop + slide; 55 WHILE line.timestamp < win.stop LOOP 56 win.memory := array_append (win.memory , line); 57 FETCH line_cursor INTO line; 58 END LOOP; 59 cnt := 1; 60 3 / 6
STARQL Query Example FOR i IN coalesce( array_lower (win.memory , 1), 1) .. 61 coalesce( array_upper (win.memory , 1), 1) LOOP IF win.memory[i]. timestamp < win.start THEN 62 cnt := cnt + 1; 63 ELSE 64 EXIT; 65 END IF; 66 END LOOP; 67 win.memory := win.memory[cnt :]; 68 END IF; 69 END LOOP; 70 win.wid := win.wid + 1; 71 RETURN QUERY SELECT win.wid , (unnest(win.memory :: 72 measurements [])).*; CLOSE line_cursor ; 73 END 74 $$ language plpgsql; 75 3 / 6
STARQL Query Example STARQL Representation of monotonicity SELECT x FROM measurements [NOW - PT5M, NOW] -> PT1M WHERE Sensor(x) HAVING FORALL t i , t j , y1, y2 IF hasVal(x,y1)<t i > AND hasVal(x,y2)<t j > AND t i < t j THEN y1 <= y2 Function exists for executing the . . . unfolded query per window! Horizontal scaling ◮ Apache Spark SQL scales horizontally and vertically. ◮ Scalable? 3 / 6
Processing times using different back ends 12 PostgreSQL PL/pgSQL Apache Spark SQL 10 Processing time in minutes 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Measurement duration in days 4 / 6
Processing times using different back ends 12 PL/pgSQL Apache Spark SQL 10 Processing time in minutes 8 6 4 2 0 0 20 40 60 80 100 120 140 160 Measurement duration in days 5 / 6
Conclusion and Future Work Conclusion: ◮ Window function can be realized by using PL/pgSQL ◮ Speed gain by using Apache Spark SQL 1 ◮ Complexity hidden by STARQL Future Work: ◮ Incremental stream processing (Not possible for every STARQL query) 1 Simon Schiff, Özgür L. Özçep, and Ralf Möller. “Ontology-based Data Access to Big Data”. In: Open Journal of Databases (OJDB) 6 (1 2018). Postproceeding of Hidest’18, pp. 21–32. 6 / 6
Processing times using distributed Apache Spark SQL 4 1 Node 2 Nodes 3 . 5 3 Nodes 4 Nodes 3 Processing time in hours 2 . 5 2 1 . 5 1 0 . 5 0 0 10 20 30 40 50 60 Measurement duration in years 7 / 6
Recommend
More recommend