data modeling in the nosql world
play

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, - PowerPoint PPT Presentation

Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim, Pratyusha Pogaru, Edgar Velazquez Link to paper: https://hal.archives-ouvertes.fr/hal-01611628/document Parts & names 1. Introduction & NoSQL


  1. Data Modeling in the NoSQL World By: Ashutosh Kale, Adham Kamel, Jordan Mercado Kevin Kim, Pratyusha Pogaru, Edgar Velazquez

  2. Link to paper: https://hal.archives-ouvertes.fr/hal-01611628/document Parts & names 1. Introduction & NoSQL Data Models (1)- Adham 2. The NoAM data model (2)- Edgar, Pratyusha 3. System-independent design of NoSQL databases with NoAM (2)- Ashutosh, Jordan 4. Related Works & Conclusion (1)- Kevin 2

  3. Introduction - NoSQL systems are an effective way to manage large sets of data across multiple servers - Interest: Supports next generation web technologies where relational DBMS does not - Data has a structure that does not fit with the typical RDBMS - Access to data based on read-write operations - Quality requirements include scalability, performance, and consistency - Main Categories of NoSQL Systems - Key-Value stores - Document stores - Extensible Record stores 3

  4. Key-Value Stores - Example: Oracle NoSQL - Database is a schemaless collection of key-value pairs where operations can access data from a single key-value pair or groups of related pairs - Keys are structured and contain both major and minor keys - Major key: non-empty sequence of strings - Minor key: sequence of strings - Component: each element of a key - ‘/’ separates key components - ‘-’ separates major key from minor key - Distinction between major and minor keys are important to control data distribution and sharding - Value: uninterpreted binary string 4

  5. Key-Value Stores - Two common representation of aggregates 1. Representation using a simple key-value pair - Major key is the aggregate identifier - Value is the complex value of the aggregate 2. Representation using multiple key-value pairs - Aggregate is split into different parts, which are represented by a distinct key-value pair - Major key is aggregate identifier for each part - Minor key identifies individual part in aggregate 5

  6. Document Stores - Example: MongoDB - Database is a set of documents, each having a complex structure and value - Each document is structured: contains complex value and a set of attribute-value pairs, which can contain values, lists, and nested documents - Documents are schemaless, so it can have its own attributes that are defined at runtime - Main document: top-level document with a unique identifier that is represented by the “_id” attribute, which is associated to a value of type ObjectId - Aggregate is represented by a single document - Document ID is the aggregate identifier - Content is the complex value of the aggregate in JSON/BSON 6

  7. Extensible Record Stores - Example: Amazon DynamoDB - Database is a set of tables, where each table is a set of rows, and each row contains a set of columns - Rows in a table are not required to have the same attributes - Operations to access data are typically over individual rows - Each table designates an attribute as a primary key - Composed of partition key and an optional sort key - Aggregates can be represented by a record/row/item - The primary key (partition key) is the aggregate identifier - Item can have a distinct attribute-value pair for each attribute of the value of the aggregate 7

  8. The NoAM Data Model - NoAM stands for NoSQL Abstract Data Model - System independent data model for NoSQL databases - Intended to support scalability, performance, and consistency 8

  9. In most NoSQL databases, the distribution unit is often: 1. Group of related key-value pairs, in key value stores; 2. Document, in document stores; 3. Record/row/item, in extensible record stores. In NoAM we introduce the distribution unit modeled as BLOCKS 9

  10. Blocks A block represents a maximal data unit for which atomic, efficient, and scalable access operations are provided. In NoSQL databases, it is easy to manipulate one block at a time, but problems arise when we try to manipulate multiple blocks such as when using JOINS 10

  11. NoSQL databases can access (i) an individual key-value pair, in key-value stores; (ii) a field, in document stores; (iii) a column, in extensible record stores. In NoAM we will call these an ENTRY Collections will preserve their name as COLLECTIONS 11

  12. We can now resume the NoAM characteristics • A database is a set of collections. Each collection has a distinct name. • A collection is a set of blocks. Each block in a collection is identified by a block key, which is unique within that collection. • A block is a non-empty set of entries. Each entry is a pair <ek, ev>, where ek is the entry key (which is unique within its block) and ev is its value 12

  13. Representation of aggregates In NoAM model 13

  14. Another way to represent NoAM database 14

  15. System-independent design of NoSQL databases with NoAM - The main goal of NoAM is to support a design methodology for NoSQL databases that are independent of any specific system - By abstracting common features within NoSQL systems (data access units & distribution units), we can design an intermediate, system-independent representation of data - This eases design process & helps support scalability and consistency qualities of DB 15

  16. System Design following the NoAM approach uses these steps: aggregate conceptual data partitioning & high-level NoSQL modeling & aggregate implementation design database design mapping the partitioning identifying intermediate data aggregates into necessary entities representation to smaller data and relationships & the specific features elements and then grouping related of a target database mapping to the entities into system NoAM intermediate aggregates data model 16

  17. Conceptual data modeling & aggregate design - Following domain-driven-design (as described in running example of paper), 1st step is to design a UML class diagram defining the entities, value objects, and relationships of the application - Next, identify the grouping of entities and values into aggregates based on data access patterns or scalability/consistency needs - Aggregates should be designed as units where atomicity can be guaranteed 17

  18. Properties of good aggregate design - Each aggregate should be large enough, but as small as possible, to include all the data required by a relevant data access operation - small aggregates reduce concurrency collisions and support performance and scalability requirements - Each aggregate should include all the data involved by some integrity constraints or rules - This supports strong consistency/atomicity of update operations 18

  19. Data representation in NoAM - In NoAM example: - class of aggregates is represented by a distinct collection - Individual aggregates are represented by a block - This representation benefits from each concept representing a unit of data access & distribution respectively at different abstraction levels - Thus, aggregates receive same operational benefits (scalability, efficiency, atomicity) as blocks 19

  20. In General... - A dataset of aggregates can be represented in NoAM databases in many different ways - Other examples include: - Entry per Aggregate Object (EAO)- each individual aggregate is represented using a single entry - Entry per Top-level Field (ETF)- each aggregate is represented by multiple entries 20

  21. EAO vs. ETF ETF EAO 21

  22. Aggregate partitioning - Aggregate partitioning is usually based on the following guidelines - If an aggregate is small or all/most of its data are accessed or modified together, it should be represented by a single entry - If an aggregate is large and there are operations that access or modify specific portions of the aggregate, it should be partitioned into multiple entries - Data elements should belong to the same entry if they are usually accessed or modified together - Data elements should belong to distinct entries if they are usually accessed or modified separately - Access path, or sequence of steps to reach an element, affects how data is accessed/modified in relation to one another 22

  23. General implementation - Mapping from the intermediate representation to specific systems will differ slightly with each type of NoSQL system (Key-Value, Document Extensible Record) - NoAM intermediate model for each example is described in figure 8 23

  24. Key-Value Store Implementation: Oracle NoSQL - In the Oracle NoSQL example, each entry will be represented by a key-value pair - The key is composed of a major key (collection name & block ID) and a minor key (coding of access path) - Major key controls distribution of sharding - The Value can be a simple value or a formatted entry (JSON) 24

  25. Extensible Record Store Implementation: DynamoDB - In DynamoDB example, a distinct table will represent each collection with individual items representing each block - Collection name will be table name, block key id will be primary key for table, set of entries in block will be used for set of attribute pairs in item 25

  26. Document Store Implementation: MongoDB - In MongoDB example, distinct MongoDB collections will represent each collection of blocks & individual documents will represent each block - Block collection name will be used for MongoDB collection name, block key id will represent special id field in a document & each entry in a block will fill a field in a document 26

Recommend


More recommend