Presented by Mark Hosang, Wayne Wight, Sedat Behar, Yevgeny Ioffe, Archana Suhas Joshi, Izi Aviyente 1 Mar. 18,2003 Query Languages
Timeline ! 1:40 – Talk Outline ! 1:45 – CQL Primer ! 2:05 – SQuAl Primer ! 2:25 – ATLaS Primer ! 2:50 – Break ! 3:00 – Sequence Query Processing ! 3:40 – Similarities and Differences (ATLaS, CQL, Aurora) ! 3:55 – Class Discussion ! 4:25 – Review Presentation Shortcomings 2 Mar. 18,2003 Query Languages
References A. Arasu, S.Babu, J.Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, November 2002. SQR- A Stream Query Repository. http://www.db.stanford.edu/stream/sqr. Joint effort of several data stream research groups. R. Motwani, J. Widom, et al. Query processing, resource management, and approximation in a data stream management system. In Proc. First Biennnial Conf. On Innovative Data Systems Research , January 2003. D.Abadi, D.Carney, U.Cetintemel, M. Cherniack, C, Convey, S. Lee, M. Stonebraker, N. Tatbul, S.Zdonik. Aurora: A New Model and Architecture for Data Stream Management. 3 Mar. 18,2003 Query Languages
Outline ! Example Application: Boston Marathon ! STREAM Solution ! Review of STREAM ! CQL ! Example Queries ! AURORA Solution: SQuAl ! Aurora operators ! Example Queries ! Comparisons between Query Languages 4 Mar. 18,2003 Query Languages
Example Application ! Boston Marathon ! Goal: To collect and process data about each runner continuously Satellite Speed Sensors Computer Base Station INTERNET INTERNET (update on (update on Finish Line Distance Timer Alerter Sensors Marathon) Marathon) 5 Mar. 18,2003 Query Languages
Example Queries Find all runners running with speed greater than 1. 6mph. Count number of runners who passed the 10 2. mile split in the last 15 minutes Specify the Athlete_Ids who finished the 3. marathon within 5 hours …. Problem: How can we express these queries??? 6 Mar. 18,2003 Query Languages
Previous Solutions Proposed ! TQL (1992) – SQL based query language of Tapestry Set て =- ∞ FOREVER DO Set t:= current time Execute Queries Q M (t) and Q M ( て ) Return Q M (t)-Q M ( て ) to user set て :=t Sleep for some period of time ENDLOOP 7 Mar. 18,2003 Query Languages
Previous Solutions Proposed ! NiagraCQ(2000) – XML-QL is used to express queries CREATE CQ_NAME XML-QL QUERY DO action | START start_time] |EVERY time_interval] |EXPIRE expiration_time] WHERE <athlete> <country>Spain</name> <name> <lastname>$1</lastname> </> </>IN* CONSTRUCT <lastname>$1</lastname> 8 Mar. 18,2003 Query Languages
Previous Solutions Proposed ! PSoup(2002) – SQL style queries, windows for Select Project Join queries SELECT * FROM SPEED WHERE (SPEED.RUNRATE<5) BEGIN (NOW-10) END (NOW) 9 Mar. 18,2003 Query Languages
STREAM’s solution: ! Define a concrete query language (CQL) that ! Exploits relational semantics ! Is easy and compact to write ! Uses standard relational transformations for query optimization 10 Mar. 18,2003 Query Languages
Recall: STREAM Define て : global, discrete, ordered time domain Stream S is unbounded bag of elements <s,t> Tuple belonging to the schema t ∈ て (timestamp) Relation R is a mapping from て to a bag of tuples where each tuple belongs to schema of the relation. (R(t)) 11 Mar. 18,2003 Query Languages
RECALL: STREAM (continued) Need mappings between streams and relations: Window Specification Language Relational Streams Relations Query Language Relation-to-Stream Operators (IStream, DStream, RStream) 12 Mar. 18,2003 Query Languages
Schema of Boston Marathon Speed (Athlete_ID /* unique identifier of the athlete */, S T R E A M S runRate /* the rate at which the athlete runs */) timestamp /* time of measurement */ Timer (Athlete_ID, time_elapsed /* time measured from start */) timestamp Distance_covered (Athlete_ID, distance /* distance from start */) timestamp Finished (Athlete_ID) timestamp Athlete (Athlete_ID, name, age, country) 13 Mar. 18,2003 Query Languages
IStream(R) contains all (s,T) Window Specification • Language where s Є R at time T but s Є R at R time T-1 Q Stream Relation L DStream(R) contains all (s,T) Relation- Stream • EXAMPLE where s Є R at time T-1 but s Є R Operators at time T Speed Stream RStream(R) contains all (s,T) We want to stream the • where s Є R at time T average of the last 5 speed recordings taken: Athlete_id Runrate ts S R 51 5 8:30:00 T E R L 62 6 8:30:00 A E Average ID Rate T Average I 73 7 8:30:01 A I Relation 83 5 S M 6.6 83 5 8:30:02 O T - Average N 54 7 54 7 8:30:03 R R - …. E E R 68 8 6.6 68 8 8:30:04 A E L 79 7 L 79 7 8:30:04 M A s A T 80 6 8:30:05 80 6 T I I …. …. …. O O N N 14 Mar. 18,2003 Query Languages
RECALL: Window Specification Language Time-based Windows: ! S[Range T] ! if T=0 S[Now] if T= ∞ S[Range Unbounded] Tuple-based Windows: ! S[Rows N] ! Partitioned Windows: ! S[Partition By A 1 , A 2 ,…..,A K Rows N] ! 15 Mar. 18,2003 Query Languages
Sample Query Find all runners running with speed greater than 6 mph SELECT * Unbounded FROM Speed S window by WHERE S.runRate > 6 default Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 16 Mar. 18,2003 Query Languages
Sample Query Count number of runners who passed the 10 mile split in the last 15 minutes SELECT Count(*) Relational FROM Distance_covered D [Range 15 Min] result WHERE D.distance >= 10 Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 17 Mar. 18,2003 Query Languages
Sample Query Specify the Athlete_IDs who finished within 5 hours of their starting SELECT Istream(F.Athlete_ID) FROM Finished[Now] F, Timer [Range 5 Hours] T WHERE F.Athlete_ID = T.Athlete_ID Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 18 Mar. 18,2003 Query Languages
Sample Query Specify runners from USA who have just finished SELECT Istream (A.*) FROM Finished[Now] F, Athletes A WHERE F.Athelete_ID = A.Athlete_ID AND A.country = ‘USA’ Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 19 Mar. 18,2003 Query Languages
Implementation: Query Optimization " Window Reduction SELECT Istream (*) FROM Speed as S[Range Unbounded] WHERE Speed>6 SELECT Istream(*) FROM Speed as S[Now] WHERE Speed>6 20 Mar. 18,2003 Query Languages
Query Optimization (Contd.) " Filter Window Commutativity SELECT Count(*) FROM Distance_covered D [Range 15 Min] WHERE D.distance >= 10 (SELECT Count(*) FROM Distance_covered D WHERE D.distance >= 10) [Range 15 Min] 21 Mar. 18,2003 Query Languages
Aurora GUI Rstream Filter Map i0 o0 Filter Box Map Box Tumble Box Slide Box Xsection Box Wsort Box Union Restream Box Merge Box Join Box Resample Box Filter Map Super Box i1 o1 Map Filter Rstream i2 o2 Starting an arc at Map Ending an arc at Filter Starting an arc at Filter Ending an arc at Map Box 4 Starting an arc at Map Box 4 Ending an arc at Union 22 Mar. 18,2003 Query Languages
SQuAl (Stream Query Algebra) Operators ! Order-Agnostic Operators ! Order-Sensitive Operators ! Filter ! BSort ! Map ! Aggregate ! Union ! Join ! Resample 23 Mar. 18,2003 Query Languages
•Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate •Join •Resample Filter(P 1 ,…,P m )(S) " Similar to relational selection " P 1 ,…,P m : predicates " S: input stream " Output: m+1 streams with same schema as S 24 Mar. 18,2003 Query Languages
•Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate t=(TS=ts, A 1 =v 1 ,……,A k =v k ) •Join •Resample Filter(P 1 ,P 2 )(Distance_covered) P 1 : Distance< 3 mile P 2 : Distance> 5 mile Result will have three streams… Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 25 Mar. 18,2003 Query Languages
•Filter •Map Order-Agnostic Operators •Union •BSort •Aggregate •Join •Resample Athlete_ID Distance Timestamp 9908 5 08:14 9909 4.1 08:15 9915 6.2 08:16 9911 2.8 08:17 9908 5.5 08:18 9909 4.9 08:19 9915 6.6 08:20 Speed (Athlete_ID, runRate, timestamp) SCHEMA: SCHEMA: Timer (Athlete_ID, time_elapsed, timestamp) Distance_covered (Athlete_ID, distance, timestamp) Finished (Athlete_ID, timestamp) Athlete (Athlete_ID, name, age, country) 26 Mar. 18,2003 Query Languages
Recommend
More recommend