clouddb
play

CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus - PowerPoint PPT Presentation

CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective and motivation (


  1. CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com

  2. What I will try to cover  Historical perspective and motivation  ( Preliminary ) Technical Approach  Current Status  Food for Thought NEC Labs Data Management Research 2

  3. Why Data Management Research?  Many Data Management Technologies and Products have been around  Data Centers have evolved over the time  Data Center hosting became a business  Database Community was successful in creating technologies and business NEC Labs Data Management Research 3

  4. Why Data Management (Again)? New Data Types New Data Sources Relational databases only Individual user via Amount of Data manage 10-15% of Web2.0 applications, the available data social sides, Amount of business collaboration, mobile data doubles every devices, sensors, etc 12-18 months (Good Old) Database New Type of Apps New Usage Patterns Highly integrated, Large Number of Users Around the clock, Extremely data around the world, Unprecedented increase intensive highly interconnected and fluctuations NEC Labs Data Management Research 4

  5. Cloud Computing  A paradigm shift in how and where a workload is generated and it gets executed  Cloud service provider – Cloud service consumer Cloud Provider A P I  Market Size  Data Management Market ~$20B  IT Cloud Service ~$42B (by 2012) (IDC) NEC Labs Data Management Research 5

  6. Cloud Computing  A paradigm shift in how and where a workload is generated and it gets executed  Cloud service provider – Cloud service consumer Cloud Provider A P I  Market Size  Data Management Market ~$20B  IT Cloud Service ~$42B (by 2012) (IDC) NEC Labs Data Management Research 6

  7. Animoto on Amazon EC2  A no-infrastructure startup  Biggest piece of hardware  A (fancy) espresso machine! Rapid growth in three days, the number of users increased from 25k to 250k  Number of servers from 50 to 3500  Problem: It is not trivial to distribute users’ accesses to the data by just scaling out Assume $500 per machine, $1.75M!  cloud computing nodes Instead, they used Amazon EC2  NEC Labs Data Management Research 7

  8. Database-as-a-Service? ICDE 2002! Technology Reaction: Cool but… Business Regulations Model Psychological Acceptance NEC Labs Data Management Research 8

  9. Data Management in Cloud  Cloud computing model may provide a platform to address new challenges  But the problem is:  Data Management Systems were not designed and implemented with cloud computing model in mind  So the question is:  What are the data management challenges we need to address before the full potential of cloud computing can be realized? NEC Labs Data Management Research 9

  10. Need for New Solutions  Massive scalability to handle  Very large amount of data  Very large number of diverse users/requests  Elasticity to  handle varying demand  optimize operating costs  Flexibility to handle different data and processing models  Massively multi-tenanted to achieve economies of scale  More intelligent system monitoring and management NEC Labs Data Management Research 10

  11. Cloud Data Management Challenges Key challenge: CloudDB scalable scan and aggregation # of records / query Data scalability Large Ultimate goal Key Analytic challenge: apps scalable (OLAP) Key challenge: read/write seamless data Large management Small Key challenge: Transactional apps scalable multi- apps (OLTP) tenant hosting # of queries / sec Multi-tenancy Query scalability NEC Labs Data Management Research 11

  12. Buy All Sizes? ? – NO! OLAP OLTP NEC Labs Data Management Research 12

  13. Buy One Size? OLAP OLTP NEC Labs Data Management Research 13

  14. Let Someone Else Do All That Access and Management OLAP OLTP NEC Labs Data Management Research 14

  15. Let Someone Else Do All That Easier adoption by developers Easier integration (dominant force for with applications adoption of cloud!) Leveraging very specialized database technologies Access and Management OLAP OLTP Easier and more flexible deployment options in the middleware NEC Labs Data Management Research 15

  16. Wish Lists Clients Service Provider - Standard language API (e.g., - Satisfying clients’ SLAs to SQL) sustain revenue - Identifiable and verifiable - Great cost efficiency via high Service Level Agreements level of automation and resource sharing to ensure profitability - Common DBMS maintenance tasks, (e.g. backup, versioning, - Maintaining an extendable patching etc.) platform for value-add services - Availability of value-add services, such as business analytics, information sharing, collaboration etc. NEC Labs Data Management Research 16

  17. (Some) Storage Models Store Type Main Purpose Pro Con - Transaction processing - Standardization - Scalability - Higher performance on Online Transaction Relational Processing (OLTP) - ACID properties - Scalable data storage -Scalability - Standardization - Read/Write intensive - Performance issues workload - Complex query Key/Value capability - ACID properties(?) - Analytics processing -Higher performance on - Standardization - Read optimized, Online Analytical - Complex query Column-Oriented throughput oriented Processing (OLAP) capability - More flexible schema evolution (?) NEC Labs Data Management Research 17

  18. Application Scenario Key/Value Relational Store Database Application v1 Application v2 Profile Portal Personal Profile Information Data Data Management Portal User 1 Products Data • Address • Online Shopping User 2 Reviews • Phone Catalogs Data • Notes • Product Reviews . • Contacts • Subscriptions . . • Calendars • … . • Reminders . Very difficult migration • Application developers (skills, time) • Architects (redesign) External Sources • Company (investment) NEC Labs Data Management Research 18

  19. Data Model Decisions  Problem: Users are forced to make a decision on the data model based on the current needs of the applications  Is it possible to make the “right” decision all the time?  Problem: The developer (client) has to re-architect their application in order to take advantage of different data models  How easy is it to change the architecture and the implementation? Application Ver Ver Ver Ver 1.0 2.0 3.0 4.0 Workload evolves… # of queries /sec Key-value store Single Clustering RDBMS Sharding NEC Labs Data Management Research 19

  20. Remember Data Independence? 1968 1970 NEC Labs Data Management Research 20

  21. Data Independence  Decouple application logic from data processing  Let them be optimized and managed independently  Enabled decades of innovation and improvement in databases NEC Labs Data Management Research 21

  22. Data Independence  The application should not have to be aware of the physical organization of the data (and how it can be accessed)  All it needs is a logical (declarative) specification  CloudDB makes decisions based on application context, workload characteristics, etc. Application Data Load Query/Update SQL API # of queries /sec CloudDB: A layer for data independence Relational Analytics Store Store Key/Value Store NEC Labs Data Management Research 22

  23. Language?  New Breed Databases  CouchDB, Project Voldemort (Dynamo), Cassandra, BigTable, Tokyo Cabinet, MangoDB , SimpleDB, ….  MapReduce/Hadoop  … NEC Labs Data Management Research 23

  24. Some Reminders about SQL  By far the most widely used data access language  It has nothing to do with  How the data is stored  How the queries are executed  How the transactions are handled  Very large number of skilled programmers  Huge amount of existing applications and tools NEC Labs Data Management Research 24

  25. SQL is actually good?  HIVE: SQL API op top of MapReduce  Google BigQuery: SQL over data stored in non-relational databases  …. NEC Labs Data Management Research 25

  26. CloudDB - Guiding Principals  Embrace heterogeneity  One size does not fit all  Leverage specialized technologies  Maintain and restore “declarative” nature of data processing  Understand and Define dimensions of scalability NEC Labs Data Management Research 26

  27. CloudDB Middleware – Opaque vs. Transparent Applications CloudDB Middleware SQL Transaction Patterns Results Queries API/Language Support (SQL) Distributed Query Processor Opaque Transparent Data Stores Consistency / Scalability …. System Independence?  The middleware would be responsible for making all the decisions regarding the choice of data  stores, processing the queries, and end-to-end system optimization While the middleware can abstract away the underlying storage systems, it should explicitly  express certain essential aspects of the system, such as consistency levels and scalability of transactions NEC Labs Data Management Research 27

Recommend


More recommend