✆ ☛☞ ✁ ✑ ✪ ☎ ✫ ✔ ☎ ✡ ✑ ✑ ✬ ✆ ☛ ✖✭ ✑ ✑ ✮ ✌ ✢ � ✣ ✡ ✆ ✁ ✂ ✄☎ ✳ ✝ ✛ ✑ ✞ ✟ ✤ ✠ ☎ ✝ ✑ ✞✩ ✛ ✪ ✣ ✤ ✕ ✆ ✡ ✄ ✎ ✧ ✓ ✖ ✲ ✌✭ ✫ ✌ ✣ ✆ ✤ ✤ ✣ ✑ ✳ ✳ ✳ ✳ ✳ ✂ ✝ ✢ ☎ ☎ ✠✯ ✌✰ ✖ ✄ ✂ ✧ ✁ ✂ ✡ ✄ ✍ ✖ ✄ ☛ ✱ ✡ ✆ ✲ ✌ ☛ ✂ ☎ ✔ ✍ ✍ ✄ ✌✎ ✓ ☎ ✟ ☛☞ ✕ ✂ ✖ ✡ ✑ ✁✗ ✘✙ ✌ ✡ ✜ ✂ � � � � ✁ ✂ ✄☎ ✆ ☎ ✝ ✞ ✆ ✟ ✠ ☎ ✝ ✚✛ ✡ ✖ ✌ ✦ ☎ ☞ ✌ ✍ ✡ ✄ ✎ ✖ ✏ ✑ ✓ ☎ ✔✟ ✕ ✂ ✖ ✖ ✍ ✌✥ ✤ ✌ ✄ ✍ ✡✢ ✌ ✣✤ ✌ ☞ ✣ � ☎ Opening Game - Who am I? Professor at the Query Processing University of Wisconsin – Madison Specializing in database performance issues (i.e. joins) Jon Frankel, Noi Jencharat, Ened Ketri, Bonus : What stream Anurag Maskey, Andy See, Larissa Smelkov system have I worked on? 3/25/03 Query Processing – Papers Query Processing – Today ’ s Agenda ����������� ✏✒✑ 1:40 Motivation & Setup Examples ��������������������������������������������� �������� 2:20 Rate Based Query Paper ✡✟★✧ 2:50 Break ��������������������������������������� �������� 3:00 Window Joins Paper �!���������"�#�������������� �������$����������%�������#����������&������� 3:30 K-Constraints Paper �����'���� ������ 4:00 Discussion 1
� � � � � � � � 127 Flashback Stream Challenges SQL Query Final Answer? Block Reads? Optimizer Parser/ Optimizer - cost based Cardinality? Translator Rewriter Select * from students Plan where major = ‘ cosi ’ Generator and birthday = ‘ 0325 ’ Evaluator Final Data Rate Based Analysis Stream Challenges MFC – 10/day; JDF 3/day Day New MFC Left JDF Left Final Answer? Rate Based 1 6 6 -> 3 0 3 -> 1 0 Block Reads? Cost 2 9 9 -> 4 0 3 -> 1 1 Cardinality? Estimating 3 12 10 -> 5 2 3 -> 1 3 4 0 2 -> 1 0 3 -> 1 1 5 •Load Shedding 1 -> 0 0 6 •Ad Hoc Queries 7 •Persistent Queries 8 9 2
✳ ✳ Rate Based Analysis Rate Based Analysis MFC – 10/day; JDF 3/day MFC – 10/day; JDF 3/day Day New MFC Left JDF Left JDF Left MFC Left Day New MFC Left JDF Left JDF Left MFC Left 1 1 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 6 6 -> 3 0 3 -> 1 0 3 -> 1 3 2 2 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 9 9 -> 4 0 3 -> 1 1 3 -> 1 9 3 3 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 12 10 -> 5 2 3 -> 1 3 3 -> 1 18 4 4 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 0 2 -> 1 0 3 -> 1 1 3 -> 1 15 5 5 1 -> 0 0 3 -> 1 12 1 -> 0 0 3 -> 1 12 6 6 3 -> 1 9 3 -> 1 9 7 7 3 -> 1 6 3 -> 1 6 8 8 3 -> 1 3 3 -> 1 3 9 9 3 -> 1 0 3 -> 1 0 9 -> 4 0 127 Flashback – Joins Cost Optimization - Speed?? Predicate Pushdown Select * from students as s, courses as c where s.major = ‘ cosi ’ Coming Up…. and c.dept = ‘ cosi ’ and s.sid = c.sid -Different ways to measure rates - SPJ applicability 3
� � � � � � � � Stream Challenges II Stream Challenges II Blocking Query Operators A A Blocking Query Operators A A (option: pipelined join) B1 C1 (option: pipelined join) B1 Lost/Delayed/Unordered Lost/Delayed/Unordered C B C1 Data Data D D1 C B And yet, benefits are And yet, benefits are F D2 D1 huge … huge … E E D D2 B2 C2 F E E B2 C2 Stock Market – Econ 2A Data is Out there! Stock prices are based on ? (http://biz.yahoo.com/cc/) Thu Mar 20 Times are U.S. Eastern 8:30 am CYCL Centennial Communications Earnings (Q3 2003) 8:30 am DV DeVry Inc. Acquires Ross University 8:30 am ENTG Entegris, Inc. Earnings (Q2 2003) 8:30 am PLXS Plexus Announcement 9:00 am HOLL Hollywood Media Corp. Fourth Quarter and Year-End 2002 9:00 am LEH Lehman Brothers Holdings First Quarter 2003 Earnings 10:00 am CSCO Cisco Systems Announces Agreement to Acquire The Linksys Group, Inc. 10:00 am FNLY Finlay Enterprises, Inc. Earnings (Q4 2002) 10:00 am GIII G-III Apparel Group Earnings (Q4 2003) 10:00 am GLYN Galyan's Trading Company, Inc. Fourth Quarter 2002 10:00 am MWD Morgan Stanley Earnings (Q1 2003) Papa 10:00 am TRMS Trimeris, Inc. Earnings (Q4 2002) S&P Federated Home 10:30 am GPN Global Payments Inc. Earnings (Q3 2003) John’s Dept Stores Depot 11:00 am GDT Biosensor`s Agreement/Drug Eluting Stent Update 11:00 am CRAI Charles River Associates Earnings (Q1 2003) 11:00 am CHKR Checkers Drive-In Restaurants Earnings (Q4 2002) 11:00 am CPWM Cost Plus Earnings (Q4 2002) 11:00 am JCREW J. Crew Group, Inc. Earnings (Q4 2003) 4
� � Query: Short-term Downward Momentum: Stocks & Stream Systems Find all NASDAQ stocks between $20 and $200 that have moved down more than 2% in the last 20 minutes and there has been significant buying pressure Tickers (70% or more of the volume has traded EPS (est) EPS (actual) toward the ask price) in the last 2 minutes. 1. IBM 80.00 1. HD 0.27 1. INTC 15.00 1. HD 22.00 Or by: 2. IBM 80.50 2. HD 0.30 Earnings, News, 2. INTC 22.25 Industry 5. HD 22.75 9. HD 23.00 Aurora Example Join Challenges – Window Options Soldiers Tanks Problem? Aurora Option (by A A (time, ID, pos) (time, ID, pos) 1, S1, A 1, T1, C individual tuple) B1 C1 A S1, S2 1, S2, A Stream Option (slide) C B B T2, S3 C T1 2, S1, A 2, T1, C D D1 A S1 1, S3, B 1, T2, B Tuple vs Timestamp F D2 2, S3, B 2, T2, C B S3 C T1, T2, S2 E E 3, S1, A Order! A T1, S1 3, S3, B 3, T1, A B2 C2 2, S2, C 3, T2, C B S3 C T2, S2 3, S2, B 5
� � � � � � Join Challenges – Window Options Join Challenges – Window Options Aurora Option (by A A Aurora Option (by A A individual tuple) individual tuple) B1 C1 B1 C1 Stream Option (slide) Stream Option (slide) C B C B D D1 D D1 Tuple vs Timestamp Tuple vs Timestamp F D2 F D2 E E E E Order! Order! B2 C2 B2 C2 Join Challenges – Window Options Accuracy – How to Aurora Option (by A A window? individual tuple) B1 C1 Stream Option (slide) C B Coming Up…. D D1 Tuple vs Timestamp F D2 -Joining algorithms E E Order! B2 C2 -Lots of cool graphs 6
✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Motivations What is Rate? Traditional Optimizers requires cardinality Number of records per a unit of time. of the input…. Output Rate = # output transmitted In streams, cardinality is not known and time needed for transmission inputs come at different rate… Output Rate = #papers RATE-BASED optimization processing time needed Output Rate Estimation Output Rate for Projections case 1: Mitch For Projections Time to read papers is shorter than time For Selections between getting the papers paper 2 paper 3 paper 1 For Joins 1 hour 1/2 hour 1 hour time paper 1 paper 2 paper 3 So the output rate = the input rate 7
✳ ✳ ✳ ✳ ✳ Output Rate for Projections Output Rate for Projections case 2: Jon In general, time to do projection is low. Time to read papers is longer than time So between getting the papers paper 1 1.5 hour paper 2 Output Rate = Input Rate 1.5 hour r o r i 1 hour time paper 1 paper 2 paper 3 So the output rate = 1/(time to do projection) Output Rate for Selections Output Rate for Selection case 1: Mitch Selectivity (f) = percentage of papers that takes 1/2 hour to read 1 paper, with will be selected selectivity = 0.5 2 hours paper 1 1/2 hour 1 hour time paper 1 paper 2 paper 3 output rate = 1/2 paper/hour So the output rate = f * the input rate 8
✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ ✳ Output Rate for Selection Output Rate for Selections case 2: John In general, time to perform selection is less than interval between inputs. takes 1.5 hour to read 1 paper, with selectivity = 0.5 So 3 hours 1.5 hour Output Rate = Selectivity * Input Rate 1 hour r o f * r i time paper 1 paper 2 paper 3 output rate = 1/3 paper/hour (= 1/2 * 1/1.5) So the output rate = f * (1/time to select) Output Rate for Joins Recall Total #’s of papers What are the papers by same author Mitch in output and Jon gives the same grading to? Output Rate = r M = No. of papers Mitch reads per hour # output transmitted r J = No. of papers Jon reads per hour time needed for transmission f = Selectivity of join Total time to do the Join C M = Time to handle reviews from Mitch C J = Time to handle reviews from Jon 9
Recommend
More recommend