Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration February 2019
Motivation • Agile software development with frequent schema changes (weekly up to daily!) Schema-flexible NoSQL databases Application version n Application version n + … • However, how to migrate variational data in the productive database? – State of the art: Within the application code Optional schema management for NoSQL database systems necessary! Uta Störl, Darmstadt University of Applied Sciences 2
Remark: NoSQL Database Are Schema-Free – A ren’t They? NoSQL DBMS NoSQL DBMS NoSQL DBMS without native with optional with mandatory schema support schema support schema Couchbase, CouchDB, Neo4J, … MongoDB, OrientDB, ArangoDB , … Cassandra, … Uta Störl, Darmstadt University of Applied Sciences 3
Schema Management for NoSQL Databases Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration Uta Störl, Darmstadt University of Applied Sciences 4
Forward Engineering: Schema Creation Create • JSON Schema • Proprietary schema formats (e.g. Mongoose for MongoDB, Ottoman for Couchbase, …) Uta Störl, Darmstadt University of Applied Sciences 5
Schema Management for NoSQL Databases Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration Uta Störl, Darmstadt University of Applied Sciences 6
Reverse Engineering: Schema Overview Uta Störl, Darmstadt University of Applied Sciences 7
Reverse Engineering: Data Exploration Uta Störl, Darmstadt University of Applied Sciences 8
Tools for NoSQL Schema Management (Selection) Multi Data Store Tools • Hackolade – Support for MongoDB, Couchbase, Elasticsearch, HBase, Cassandra & Datastax , DynamoDB, Cosmos DB, Avro, Hive, and Hbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://hackolade.com/ • erwin DM NoSQL – Support for MongoDB and Couchbase – Forward- and Reverse-Engineering (available in Professional edition only) – https://erwin.com/products/erwin-dm-nosql/ Uta Störl, Darmstadt University of Applied Sciences 9
Tools for NoSQL Schema Management (Selection) Single Data Store Tools (MongoDB) • MongoDB Compass – (Reverse-Engineering) available free of charge – https://www.mongodb.com/products/compass • Studio 3T – (Reverse-Engineering) (available in Pro edition only) – https://studio3t.com/ Uta Störl, Darmstadt University of Applied Sciences 10
Tools for NoSQL Schema Management (Selection) Research Prototypes (Multi Data Store Tools) • NoSQL Data Engineering Project NoSQL DEP – Support for MongoDB and CouchDB – Reverse Engineering – University of Murcia, Spain: https://www.researchgate.net/project/NoSQL-Data-Engineering – Source: https://github.com/catedrasaes-umu/NoSQLDataEngineering/ • Darwin: Schema Management for NoSQL Database Systems – Support for MongoDB and Couchbase – Reverse Engineering – Darmstadt University of Applied Sciences, University of Rostock, OTH Regensburg, Germany: https://fbi.h-da.de/personen/uta-stoerl/dfg-projekt-nosql-schema-evolution/ – https://www.researchgate.net/project/Darwin-Schema-Management-in-NoSQL- Databases Uta Störl, Darmstadt University of Applied Sciences 11
NoSQL Schema Management: So Far, So Good • We are able to – Define schemas and validate data (Forward Engineering) – Extract a schema overview and explore data (Reverse Engineering) • However, what about the heterogeneous data (due to different application releases, for example) in the NoSQL database? ??? Uta Störl, Darmstadt University of Applied Sciences 12
Continuous Database Evolution • (Optional) schema management for NoSQL databases Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration Two main tasks – Schema evolution management – Data migration Uta Störl, Darmstadt University of Applied Sciences 13
Approaches to Realize Data Migration • Custom-coded Migration Scripts – Error-prone and expensive • Using Object-NoSQL-Mapper Annotations – Easy to realize for simple evolution operations like add, delete, and rename (e.g. @AlsoLoad) – More expensive for complex operations like split, merge, copy, and move (coding @PostLoad methods) Uta Störl, Darmstadt University of Applied Sciences 14
Approaches to Realize Data Migration Application Application Version n + 1 Version n • Forward Engineering Forward Engineering – Schema Creation Schema Schema Evolution Operations Version n Version n+1 • Reverse Engineering – Schema Overview – Data Exploration Advanced Reverse Engineering – Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 15
Example from Marine Biology • JSON datasets for Species classification of the Baltic Sea and observation Protocols entity type Species {"id": 123, {"id": 124, {"id": 125, {"id": 126, "name": "Mya arenaria", "name": "Abra prismatica", "name": "Abra alba", "name": "Abra aequalis", "ts": 1} "ts": 3, "ts": 5, "ts": 7, " category ": 141436} " WoRMS ": 141433} "WoRMS": 293683} entity type Protocols {"id": 900, {"id": 901, {"id": 902, {"id": 903, "time": "2017-07-21", "time": "2017-07-21", "time": "2017-07-23", "time": "2017-07-24", "location": "location": "location": "location": {"x":19.863281, {"x":19.863285, {"x":19.863281, {"x":19.863285, "y":58.487952, "y":58.487952, "y":58.487961, "y":58.487952, "z":-1400}, "z":-1400}, "z":-1350}, "z":-1400}, "spec_id": 123, "spec_id": 123, "spec_id": 125, "spec_id": 126, "ts": 2}, "ts": 4} "ts": 6} "ts": 8, " WoRMS ": 293683} WoRMS: World Register of Marine Species Uta Störl, Darmstadt University of Applied Sciences 16
Short Excursion: Schema Version Extraction Step 1 - Building the Schema Version Graphs Species [1,3,5,7] id name ts category WoRMS [1,3,5,7] [1,3,5,7] [1,3,5,7] [3] [5,7] type: number type: string type: number type: number type: number Protocols [2,4,6,8] WoRMS id ts time location spec_id [8] [2,4,6,8] [2,4,6,8] [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number type: number type: object type: number x y z [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number Uta Störl, Darmstadt University of Applied Sciences 17
Short Excursion: Schema Version Extraction Step 2 - Deriving Schema Evolution Operations Species 2 [1,3,5,7] add integer Species.category id name ts category WoRMS 3 [1,3,5,7] [1,3,5,7] [1,3,5,7] [3] [5,7] rename Species.category to WoRMS type: number type: string type: number type: number type: number 2 3 3 4 or delete Species.category Protocols [2,4,6,8] add Species.WoRMS 4 WoRMS id ts time location spec_id [8] [2,4,6,8] [2,4,6,8] add integer Protocols.WoRMS [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number type: number type: object type: number or 4 x y z copy Species.WoRMS to Protocols.WoRMS [2,4,6,8] [2,4,6,8] [2,4,6,8] type: number type: number type: number where Species.id = Protocols.spec_id Uta Störl, Darmstadt University of Applied Sciences 18
Short Excursion: Schema Version Extraction Step 3 - Resolving Ambiguities • Interactively resolving ambiguous schema evolution operations: – alternative schema evolution operations – specifying join conditions for move or copy operations • Open questions – Automated choice in case of ambiguities – Suggestion of meaningful join conditions • Approach to a solution – Algorithm for deriving inclusion dependencies from NoSQL datasets proposed in [Klettke et al. 2017] Uta Störl, Darmstadt University of Applied Sciences 19
Advanced Reverse Engineering: Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 20
Advanced Reverse Engineering: Schema Version Extraction Uta Störl, Darmstadt University of Applied Sciences 21
Continuous Database Evolution • (Optional) schema management for NoSQL databases Application version n Application version n + … Schema Management Schema Management Schema Evolution Data Migration Two main tasks – Schema evolution management – Data migration as safe process based on schema evolution management Uta Störl, Darmstadt University of Applied Sciences 22
Recommend
More recommend