Do’s and Don’ts of a Hybrid Environment MySQL and MongoDB
Introduction I’m Rick Vasquez a TAM at Percona. I have all three types of customers: MongoDB only 1. MySQL only 2. MongoDB and MySQL hybrid 3.
The “Do’s” of a Hybrid environment Use the Right tool for the right job 1. Examine the use-case prior to choosing a database technology 2. Scale your database with built-in features 3. Understand the difference between Consistency vs Eventual Consistency 4.
Right tool for the right job It’s better to use the right DB for the right use-case than manage the wrong DB for the same use-case: Phillips head screwdriver vs flat head screwdriver 1. http://www.howstuffcompares.com/doc/s/slotted-screwdriver-vs-philips-screwdriver .htm MongoDB for highly relational structures vs MySQL for sharded environments 2.
Examine your use-case Are you storing a high volume of data and not reading it? 1. a. What kind of storage engine do you need? What degree of consistency do you need? 2. a. Where are the writes/reads happening and what do the answers need to look like? Is some of your data used more? 3. a. How many data hotspots could you have? Is the data relational in nature? 4. a. Strong relational datasets b. Relationships exist c. No relationships
Scale your DB with Built-in features High write/low read concurrency? 1. a. MongoRocks High OLTP workload? 2. a. MySQL Partitioning and synchronous replication options Logically distributed data? 3. a. MongoDB sharding and balancing for horizontal scaling Non-production application load? 4. a. MongoDB - Hidden Secondaries b. MySQL - Asyncronous/semi-syncronous slaves
What is consistency? When a Database is considered “Consistent” it means that all of the nodes contain the same data and asking the same question will result in the same answer. Eventual consistency will eventually give you the same answer, but until the replication has been applied it could be different MongoDB with any secondary reads is by nature always Eventually Consistent, there is no way to configure it otherwise MySQL by nature is consistent only on the master node and can be configured to be consistent partially or throughout an entire cluster using semi-synchronous or synchronous replication.
To Be Consistent or To Be Eventually Consistent Does the application perform reads immediately after writes? • Consider MySQL to allow for built-in scalability options • Do not perform “secondaryPreferred” reads on MongoDB for these applications Does application perform reads and writes from different nodes? • MySQL has a few options for multi-master consistency - Galera Cluster - Percona XtraDB Cluster (based on Galera Cluster) - MySQL Group Replication (early in GA) Both Databases can have “strong consistency” even in asynchronous only
The Don’ts of a Hybrid environment Put both db’s in the same instance 1. Attempt to have all data reconciled in one technology as the database of 2. record Assume technologies work the same 3. Expect two database technologies to be the silver bullet 4.
Deploy the right way Do not have two database technologies fighting for the same resources 1. a. Fights over OS cache will happen b. Fights over resources will happen Docker containers with resource constraints are still not okay 2. a. Can work in a very limited scenario, still sharing underlying resources VM’s are better 3. a. VM’s typically work, but can stress a box if 2 high workload DB’s are taxing the box Separate physical hardware (even if VM’s) is best 4. a. Different physical spindles for storage b. Different memory modules c. Different physical CPU
Non-native replication Inconsistent data • columns/rows != documents Heavy ETL necessary Square peg round hole Bad things happen If you must have data together for an analytics use-case do it for the analytics use-case not to have all data in one system.
Joins are default! I can assure you that they are not. MongoDB can “join” documents, but it operates fundamentally differently than 1. a SQL join MySQL can store JSON documents (5.7), but querying them is fundamentally 2. different than a find in MongoDB Performance benchmarks are not apples to apples, test your usecase!
All your base are not belong to us Deploying NoSQL and SQL does not solve all of your problems Explore proper caching layers • Memcached • Redis Explore proper log management • Logstash • Splunk • fluentd Explore proper queue/task management • RabbitMQ • ZeroMQ
Conclusion Remember to treat your data right by: Thinking about how you will use it 1. Expect it to fit in any system 2. Scaling natively 3.
Recommend
More recommend