nodb efficient query processing on
play

NoDB: Efficient Query Processing on Raw Data Files Ioannis - PowerPoint PPT Presentation

NoDB: Efficient Query Processing on Raw Data Files Ioannis Alagiannis Renata Borovica Miguel Branco Stratos Idreos * Anastasia Ailamaki cole Polytechnique *CWI, Amsterdam Fdrale de Lausanne From data to results Time


  1. NoDB: Efficient Query Processing on Raw Data Files Ioannis Alagiannis ‡ Renata Borovica ‡ Miguel Branco ‡ Stratos Idreos * Anastasia Ailamaki ‡ ‡ École Polytechnique *CWI, Amsterdam Fédérale de Lausanne

  2. From data to results Time query loading data preparation data Reduce data-to-query time

  3. Querying data in situ 250 raw file parsing 200 0|ALGERIA|0| haggle. carei … 0|ALGERIA|0| haggle. carefully fina 1|ARGENTINA|1|al foxes p… Execution Time (sec) 1|ARGENTINA|1|al foxes promise query 2|BRAZIL|1|y alongside o… 2|BRAZIL|1|y alongside of the pen 3|CANADA|1|eas hang ironic, silen 150 3|CANADA|1|eas hang iro … 4|EGYPT|4|y above the carefully th ……………………………….………… ... convert 100 token. convert to binary 50 Fields parse I/O 1 ARGENTINA 1 al foxes promis 0 tokenize Straw-man approach Straw-man approach is slow 3

  4. NoDB philosophy No data loading Time Instant gateway to data Raw files first-class citizen Driven by the workload Adaptive in situ DBMS 4

  5. NoDB in practice Efficient in situ querying Minimal changes to the query engine NoDB + philosophy = PostgresRaw 5

  6. PostgresRaw Adaptive indexing mechanism positional map scan cache raw files operator statistics PostgreSQL query engine 6

  7. Positional map positions of attributes attributes Raw File Reduce parsing tuples Reduce tokenizing Created on-the-fly 7

  8. Positional map known position looking for Raw File Reduce parsing Reduce tokenizing Created on-the-fly 8

  9. Positional map in action attributes tuples 1. Positional map is empty Indexed attributes: PM: 9

  10. Positional map in action attributes tuples 1. Positional map is empty 2. Q1 accesses a4 and a6 Indexed attributes: a4, a6 PM: p4, p6 p4, p6 p4, p6 p4, p6 10

  11. Positional map in action attributes tuples 1. Positional map is empty 2. Q1 accesses a4 and a6 3. Q2 accesses a4 and a9 Indexed attributes: a4, a6 a4, a6, a9 PM: p4, p6 p4, p6, p9 Make raw data access p4, p6 p4, p6, p9 progressively cheaper p4, p6 p4, p6, p9 p4, p6 p4, p6, p9 11

  12. PostgresRaw avoid raw file positional map accesses scan cache raw files operator statistics generate statistics PostgreSQL adaptively query engine 12

  13. Impact of positional map Random queries on 10 attributes Vary storage capacity (15MB-2GB) 50 Execution Time (sec) 40 15% from full positional map 2x improvement 30 20 10 0 0 200 400 600 800 1000 1200 # pointers (in millions) No need for the whole positional map 13

  14. Adapting to changes workload changes 100 PostgreSQL Execution Time (sec) - log scale PostgresRaw 10 1 Q1 Q5 Q10 Q15 Q20 Q25 Q30 Q35 Q40 Query Sequence Graceful adaptation to workload changes 14

  15. PostgresRaw vs. other DBMS Q20 Q19 Q18 1800 Q17 Q16 Q15 Q14 Q13 Q12 1600 Q11 Q10 Q9 Q8 Q7 Q6 1400 Q5 Q4 Q3 Execution Time (sec) 1200 Q2 Q1 Load 1000 800 Data Loading 600 400 200 0 MySQL CSV Engine DBMS X DBMS X PostgreSQL PostgresRaw MySQL w/ external files 15

  16. PostgresRaw vs. other DBMS Q20 Q19 Q18 1800 Q17 Q16 Q15 Q14 Q13 Q12 1600 Q11 Q10 Q9 Q8 Q7 Q6 1400 Q5 Q4 Q3 Execution Time (sec) 1200 Q2 Q1 Load 1000 800 600 400 200 0 MySQL CSV Engine DBMS X DBMS X PostgreSQL PostgresRaw MySQL w/ external files 16

  17. PostgresRaw vs. other DBMS Q20 Q19 Q18 1800 Q17 Q16 Q15 Q14 Q13 Q12 1600 Q11 Q10 Q9 Q8 Q7 Q6 1400 Q5 Q4 Q3 Execution Time (sec) 1200 Q2 Q1 Load 1000 800 600 400 200 0 MySQL CSV Engine DBMS X DBMS X PostgreSQL PostgresRaw MySQL w/ external files Competitive with conventional DBMS 17

  18. NoDB Today Data Data Query Preparation Query Query Query Adaptive load - store - execute Thank you!! 18

Recommend


More recommend