CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted
Agenda 1. Data relationships 2. Accessing embedded documents 2
The big schema design question is whether to embed documents or normalize Embedded vs normalized Embedded Normalized Students Grades for each class taken Each document in a collection has Each entity stands alone embedded documents Query second collection for details on primary collection 3
Embedded data model moves all fields into one document > db.Students.find().pretty() { Also known as a denormalized data model "_id" : ObjectId(”ABC"), "name" : "Alice", Allows applications to store related pieces "year" : 20, of information in the same database record "GPA" : 3.5, "grades" : [ Improves read performance { "class" : "CS1", Result is fewer database queries and "grade" : "A" updates (no joins needed) }, { "class" : "CS10", Writes to documents are atomic "grade" : "A-" } ] } Use when: • Have a “contains” or “has” relationship between entities • 1:M relationship when M ≲ 1000 and when many side will always appear with one side (not stand alone) 4 • Document must be < 16 MB in size
Normalized data model references other documents, like a relational database Normalized data model Grades collection > db.Grades.find().pretty() Like normalized { Students collection tables in RDBMS "_id" : ObjectId(”123"), > "student_id" : ObjectId(”ABC"), db.Students1.find().pretty() "class" : "CS1", { "grade" : "A" "_id" : ObjectId(”ABC"), } "name" : "Alice", "year" : 20 { } "_id" : ObjectId(”124"), "student_id" : ObjectId(”ABC"), Referential integrity "class" : "CS10", is not enforced "grade" : "A-" } Use when: • Embedding would result in duplication of data, but would not Writes are not improve read performance enough to outweigh duplication atomic across collections, but • To represent complex M:N relationships MongoDB has • To model large hierarchical datasets transactions 5 Source: https://docs.mongodb.com/manual/core/data-model-design/
1:1 relationships often suggest using embedded documents 1:1 relationships Normalized Embedded // patron collection { { _id: "joe", _id: "joe", name: "Joe Bookreader", name: "Joe Bookreader" address: { } street: "123 Fake Street", city: "Faketon", // address collection state: "MA", { zip: "12345" patron_id: "joe", //patron } street: "123 Fake Street", } city: "Faketon", Embed address into patron document • state: "MA", Now one database read gets both • zip: "12345" patron and address info vs. two reads } for normalized approach 6 Embedding is the preferred approach • Source: https://docs.mongodb.com/manual/core/data-model-design/
1:1 relationship counter-example is the subset problem, use normalized approach 1:1 relationship subset problem If you normally only { "_id": 1, need summary data "title": "The Arrival of a Train", about a movie, then "year": 1896, having plot and "plot": "A train is seen pulling into a station” fullplot means more "fullplot": "A group of people are standing in a straight line along… disk block reads "type": "movie", "directors": [ "Auguste Lumière", "Louis Lumière" ], Create separate "imdb": { collection for movie "rating": 7.3, "votes": 5043, "id": 12 details }, Easily store multiple "countries": [ "France" ], values in array "genres": [ "Documentary", "Short" ], Leave summary fields "tomatoes": { Would require in main collection "viewer": { multiple tables and "rating": 3.7, "numReviews": 59 JOINs in RDBMS Only read details } when needed } 7 Source: https://docs.mongodb.com/manual/core/data-model-design/
1:M relationships: embed documents if number of embedded document is small 1:M embedded relationships Max document size is 16MB Normalized Embedded // patron collection { "_id": "joe", { _id: "joe", "name": "Joe Bookreader", name: "Joe Bookreader” } "addresses": [ { "street": "123 Fake Street", // address collection "city": "Faketon", { patron_id: "joe", //patron "state": "MA", street: "123 Fake Street", "zip": "12345” }, city: "Faketon", state: "MA", { "street": "1 Some Other Street", zip: "12345” } "city": "Boston", "state": "MA", { patron_id: "joe", //patron "zip": "12345” } street: "1 Some Other Street", ] city: "Boston", } state: "MA", All addresses read in with one read of document • zip: "12345 } No need for a JOIN operation to get addresses • Subset problem applies here too • 8 Use if address does not need to stand alone • Source: https://docs.mongodb.com/manual/core/data-model-design/
1:M relationships: use normalized references to avoid duplication 1:M normalized relationships //publisher collection { _id: "oreilly", //books collection name: "O'Reilly Media", { title: "MongoDB: The Definitive Guide", founded: 1980, author: [ "Kristina Chodorow", "Mike Dirolf"], location: "CA” } published_date: ISODate("2010-09-24"), //books collection pages: 216, { _id: 123456789, language: "English", title: "MongoDB: The Definitive Guide", publisher: { name: "O'Reilly Media", author: [ "Kristina Chodorow", "Mike Dirolf"], founded: 1980, location: "CA” } published_date: ISODate("2010-09-24"), } pages: 216, language: "English", { title: "50 Tips and Tricks for MongoDB ", publisher_id: "oreilly” } author: "Kristina Chodorow", { _id: 234567890, published_date: ISODate("2011-05-06"), title: "50 Tips and Tricks for MongoDB ", pages: 68, author: "Kristina Chodorow", language: "English”, published_date: ISODate("2011-05-06"), publisher: { name: "O'Reilly Media", pages: 68, founded: 1980, location: "CA” } language: "English", } 9 publisher_id: "oreilly"} Source: https://docs.mongodb.com/manual/core/data-model-design/
M:N relationships can be easily implemented with two-way referencing M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task ObjectId(”125") //Alice’s third task Create array of references ] • Person to task } • Task to person db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [ObjectId(”ABC") // Reference to Alice ObjectId(“DEF”) //Reference to another person assigned to this task ] } Advantage: Easy to find who is assigned to tasks, and which tasks a person is assigned • Disadvantage: If person added to removed from task, must update two tables • 10 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
Sometimes it is useful to denormalize M:N Two collections One person is assigned many tasks db.person.findOne() One task is assigned to many people { _id: ObjectId(”ABC"), name: ”Alice", tasks [ // Alice is assigned three tasks ObjectId(”123"), //write lesson plan below ObjectId(”124"), //another task Denormalize to include person’s name ObjectId(”125") //still another task in tasks collection of assigned people ] Now do not need to look up the } names of people assigned to tasks db.tasks.findOne() { _id: ObjectID(”123"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), assigned: [{person _id: ObjectId(”ABC"), name: “Alice”}, // now have Alice’s name {person_id: ObjectId(“DEF”), name: “Bob”} //also have Bob’s name Use denormaliztion if many ] more reads than writes } Advantage: No need to lookup people’s name when finding tasks • Do not denormalize something that changes frequently! Disadvantage: If Alice’s name changes, must update person collection and all entries in task collection • 11 Adapted from https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
Recommend
More recommend