apache phoenix
play

Apache Phoenix We put the SQL back in NoSQL - PowerPoint PPT Presentation

Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ About me Engineer at Salesforce.com in BigData group o Started Phoenix as internal project ~2.5


  1. Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/

  2. About me Engineer at Salesforce.com in BigData group o Started Phoenix as internal project ~2.5 years ago o Open-source on Github ~1.5 years ago o Apache incubator for past 4 months Completed o Engineer at BEA Systems o XQuery-based federated query engine o SQL-based complex event processing engine o

  3. Agenda What is Apache Phoenix? o Why is it so fast? o How does it help HBase scale? o Completed Roadmap o Q&A o

  4. What is Apache Phoenix? Completed

  5. What is Apache Phoenix? Turns HBase into a SQL database 1. Query Engine o MetaData Repository Completed o Embedded JDBC driver o Only for HBase data o

  6. What is Apache Phoenix? Fastest way to access HBase data 2. HBase-specific push down o Compiles queries into native Completed o HBase calls (no map-reduce) Executes scans in parallel o

  7. SELECT * FROM t WHERE k IN (?,?,?) Phoenix Stinger (Hive 0.11) Completed 7,000x faster 0.04 sec 280 sec * 110M row table

  8. What is Apache Phoenix? Lightweight 3. No additional servers required o 100% Java Completed o

  9. HBase Cluster Architecture

  10. HBase Cluster Architecture Phoenix

  11. HBase Cluster Architecture Phoenix Phoenix

  12. What is Apache Phoenix? Integration-friendly 4. Map to existing HBase table o Integrate with Apache Pig Completed o Integrate with Apache Flume o Integrate with Apache Sqoop (wip) o

  13. What is Apache Phoenix? Turns HBase into a SQL database 1. Fastest way to access HBase data 2. Lightweight 3. Completed Integration-friendly 4.

  14. Why is Phoenix so fast? Completed

  15. Why is Phoenix so fast? HBase 1. Fast, but “dumb” (on purpose) o Data model 2. Completed Support for composite primary key o Binary data sorts naturally o Client-side parallelization 3. Push down 4. Custom filters and coprocessors o

  16. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table

  17. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table Column Family A Column Family B

  18. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3

  19. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue

  20. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 2 KeyValue KeyValue

  21. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 2 KeyValue KeyValue Row Key 3 KeyValue

  22. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table HBase Table Column Family A Column Family B Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 1 Value Row Key 2 KeyValue KeyValue Row Key 2 Value Value Row Key 3 KeyValue Row Key 3 Value

  23. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table HBase Table HBase Table Column Family A Column Family B Column Family A Column Family B Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Qualifier 1 Qualifier 2 Qualifier 3 Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 1 Value Row Key 1 Value Row Key 2 KeyValue KeyValue Row Key 2 Value Value Row Key 2 Value Value Row Key 3 KeyValue Row Key 3 Value Row Key 3 Value

  24. Phoenix Data Model Phoenix maps HBase data model to the relational world HBase Table HBase Table HBase Table Column Family A Column Family B Column Family A Column Family B Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Qualifier 1 Qualifier 2 Qualifier 3 Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 1 Value Row Key 1 Value Row Key 2 KeyValue KeyValue Row Key 2 Value Value Row Key 2 Value Value Row Key 3 KeyValue Row Key 3 Value Row Key 3 Value Multiple Versions

  25. Phoenix Data Model Phoenix maps HBase data model to the relational world Phoenix Table HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 2 KeyValue KeyValue Row Key 3 KeyValue

  26. Phoenix Data Model Phoenix maps HBase data model to the relational world Phoenix Table HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 2 KeyValue KeyValue Row Key 3 KeyValue Key Value Columns

  27. Phoenix Data Model Phoenix maps HBase data model to the relational world Phoenix Table HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 KeyValue Row Key 2 KeyValue KeyValue Row Key 3 KeyValue Primary Key Constraint Key Value Columns

  28. Example Over metrics data for servers with a schema like this: SERVER METRICS HOST VARCHAR Row Key DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER

  29. Example Over metrics data for servers with a schema like this: SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER Key Values CPU_TIME INTEGER IO_TIME INTEGER

  30. Example DDL command looks like this: CREATE TABLE SERVER_METRICS ( HOST VARCHAR, DATE DATE, RESPONSE_TIME INTEGER, GC_TIME INTEGER, CPU_TIME INTEGER, IO_TIME INTEGER, CONSTRAINT pk PRIMARY KEY ( HOST , DATE ))

  31. Example With data that looks like this: SERVER METRICS HOST + DATE RESPONSE_TIME GC_TIME SF1 1396743589 1234 SF1 1396743589 8012 … SF3 1396002345 2345 SF3 1396002345 2340 SF7 1396552341 5002 1234 … Row Key

  32. Example With data that looks like this: SERVER METRICS HOST + DATE RESPONSE_TIME GC_TIME SF1 1396743589 1234 SF1 1396743589 8012 … SF3 1396002345 2345 SF3 1396002345 2340 SF7 1396552341 5002 1234 … Key Values

  33. Phoenix Push Down: Example SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ Completed GROUP BY host

  34. Phoenix Push Down: Example SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ Completed GROUP BY host

  35. Phoenix Push Down: Example SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ Completed GROUP BY host

  36. Phoenix Push Down: Example SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ Completed GROUP BY host

  37. Phoenix Push Down: Example SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ Completed GROUP BY host

  38. Phoenix Push Down 1. Skip scan filter 2. Aggregation 3. TopN 4. Hash Join

  39. Phoenix Push Down: Skip scan SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ GROUP BY host

  40. Phoenix Push Down: Skip scan R 1 R 2 Completed R 3 R 4

  41. Phoenix Push Down: Skip scan Client-side parallel scans R 1 scan 1 R 2 Completed scan 2 R 3 scan 3 R 4

  42. Phoenix Push Down: Skip scan Server-side filter SKIP Completed

  43. Phoenix Push Down: Skip scan Server-side filter INCLUDE Completed

  44. Phoenix Push Down: Skip scan Server-side filter SKIP Completed

  45. Phoenix Push Down: Skip scan Server-side filter INCLUDE Completed

  46. Phoenix Push Down: Skip scan Server-side filter SKIP

  47. Phoenix Push Down: Skip scan Server-side filter INCLUDE

  48. Phoenix Push Down: Skip scan Server-side filter INCLUDE INCLUDE INCLUDE

  49. Phoenix Push Down: Aggregation SELECT host, avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND host LIKE ‘SF%’ GROUP BY host

  50. Phoenix Push Down: Aggregation Aggregate on server-side SERVER METRICS SERVER METRICS HOST DATE KV 1 KV 2 KV 3 HOST AGGREGATE VALUES SF1 Jun 2 10:10:10.234 239 234 674 SF1 3421 SF1 Jun 3 23:05:44.975 23 234 SF2 2145 SF1 Jun 9 08:10:32.147 256 314 341 SF3 9823 SF1 Jun 9 08:10:32.147 235 256 SF1 Jun 1 11:18:28.456 235 23 SF1 Jun 3 22:03:22.142 234 314 SF1 Jun 3 22:03:22.142 432 234 256 SF2 Jun 1 10:29:58.950 23 432 SF2 Jun 2 14:55:34.104 314 876 23 SF2 Jun 3 12:46:19.123 256 234 314 SF2 Jun 3 12:46:19.123 432 SF2 Jun 8 08:23:23.456 876 876 235 SF2 Jun 1 10:31:10.234 234 234 876 SF3 Jun 1 10:31:10.234 432 432 234 SF3 Jun 3 10:31:10.234 890 SF3 Jun 8 10:31:10.234 314 314 235 SF3 Jun 1 10:31:10.234 256 256 876 SF3 Jun 1 10:31:10.234 235 234 SF3 Jun 8 10:31:10.234 876 876 432 SF3 Jun 9 10:31:10.234 234 234 SF3 Jun 3 10:31:10.234 432 276 … … … … …

Recommend


More recommend