Security: External Authentication ● LDAP Authentication ○ Supported in PSMDB and MongoDB Enterprise ○ The following components are necessary for external authentication to work ■ LDAP Server: Remotely stores all user credentials (i.e. user name and associated password). ■ SASL Daemon: Used as a MongoDB server-local proxy for the remote LDAP service. ■ SASL Library: Used by the MongoDB client and server to create authentication mechanism-specific data. ○ Creating a User: db.getSiblingDB("$external").createUser( {user : christian, roles: [{role: "read", db: "test"} ]} ); ○ Authenticating as a User: db.getSiblingDB("$external").auth({ mechanism:"PLAIN", user:"christian", pwd:"secret", digestPassword:false}) ○ Other auth methods possible with MongoDB Enterprise 22
Security: SSL Connections and Auth ● SSL / TLS Connections ○ Supported since MongoDB 2.6x ■ May need to complile-in yourself on older binaries ■ Supported 100% in Percona Server for MongoDB ○ Minimum of 128-bit key length for security ○ Relaxed and strict (requireSSL) modes ○ System (default) or Custom Certificate Authorities are accepted ● SSL Client Authentication (x509) ○ MongoDB supports x.509 certificate authentication for use with a secure TLS/SSL connection as of 2.6.x. ○ The x.509 client authentication allows clients to authenticate to servers with certificates rather than with a username and password. ○ Enabled with: security.clusterAuthMode: x509 23
Security: Encryption at Rest ● MongoDB Enterprise ○ Encryption supported in Enterprise binaries ($$$) ● Percona Server for MongoDB ○ Use CryptFS/LUKS block device for encryption of data volume ○ Documentation published (or coming soon) ○ Completely open-source / Free ● Application-Level ○ Selectively encrypt only required fields in application ○ Benefits ■ The data is only readable by the application (reduced touch points) ■ The resource cost of encryption is lower when it’s applied selectively ■ Offloading of encryption overhead from database 24
Security: Network Firewall ● MongoDB only requires a single TCP port to be reachable (to all nodes) ○ Default port 27017 ○ This does not include monitoring tools, etc ■ Percona PMM requires inbound connectivity to 1-2 TCP ports ● Restrict TCP port access to nodes that require it! ● Sharded Cluster ○ Application servers only need access to ‘mongos’ ○ Block direct TCP access from application -> shard/mongod instances ■ Unless ‘mongos’ is bound to localhost! ● Advanced ○ Move inter-node replication to own network fabric, VLAN, etc ○ Accept client connections on a Public interface 25
More on Security (some overlap) Room: Field Suite #2 Time: Tuesday, 17:25 to 17:50 26
Monitoring
Monitoring: Methodology ● Monitor often ○ 60 - 300 seconds is not enough! ○ Problems can begin/end in seconds ● Correlate Database and Operating System together! ● Monitor a lot ○ Store more than you graph ○ Example: PMM gathers 700-900 metrics per polling ● Process ○ Use to troubleshoot Production events / incidents ○ Iterate and Improve monitoring ■ Add graphing for whatever made you SSH to a host ■ Blind QA with someone unfamiliar with the problem 28
Monitoring: Important Metrics ● Database ○ Operation counters ○ Cache Traffic and Capacity ○ Checkpoint / Compaction Performance ○ Concurrency Tickets (WiredTiger and RocksDB) ○ Document and Index scanning ○ Various engine-specific details ● Operating System ○ CPU ○ Disk ○ Bandwidth / Util ○ Average Wait Time ○ Memory and Network 29
Monitoring: Percona PMM ● Open-source monitoring from Percona! ● Based on open-source technology Prometheus ○ Grafana ○ Go Language ○ ● Simple deployment ● Examples in this demo are from PMM! ● Correlation of OS and DB Metrics ● 800+ metrics per ping 30
Architecture and High-Availability
High Availability ● Replication ○ Asynchronous ■ Write Concerns can provide psuedo-synchronous replication ■ Changelog based, using the “Oplog” ○ Maximum 50 members ○ Maximum 7 voting members ■ Use “vote:0” for members $gt 7 ○ Oplog ■ The “oplog.rs” capped-collection in “local” storing changes to data ■ Read by secondary members for replication ■ Written to by local node after “apply” of operation 32
Architecture ● Datacenter Recommendations ○ Minimum of 3 x physical servers required for High-Availability ○ Ensure only 1 x member per Replica Set is on a single physical server!!! ● EC2 / Cloud Recommendations ○ Place Replica Set members in odd number of Availability Zones, same region ○ Use a hidden-secondary node for Backup and Disaster Recover in another region ○ Entire Availability Zones have been lost before! 33
Hardware
Hardware: Mainframe vs Commodity ● Databases: The Past ○ Buy some really amazing, expensive hardware ○ Buy some crazy expensive license ■ Don’t run a lot of servers due to above ○ Scale up: ■ Buy even more amazing hardware for monolithic host ■ Hardware came on a truck ○ HA: When it rains, it pours ● Databases: A New Era ○ Everything fails, nothing is precious ○ Elastic infrastructures (“The cloud”, Mesos, etc) ○ Scale up: add more cheap, commodity servers ○ HA: lots of cheap, commodity servers - still up! 35
Hardware: Block Devices ● Isolation Run Mongod dbPaths on separate volume ○ Optionally, run Mongod journal on separate volume ○ ● RAID Level RAID 10 == performance/durability sweet spot ○ RAID 0 == fast and dangerous ○ ● SSDs Benefit MMAPv1 a lot ○ Benefit WT and RocksDB a bit less ○ Keep about 30% free for internal GC on the SSD ○ 36
Hardware: Block Devices ● EBS / NFS / iSCSI Risks / Drawbacks ○ Exponentially more things to break ■ Block device requests wrapped in TCP is extremely slow ■ You probably already paid for some fast local disks ■ More difficult (sometimes nearly-impossible) to troubleshoot ■ MongoDB doesn’t really benefit from remote storage features/flexibility ■ ● Built-in High-Availability of data via replication ● MongoDB replication can bootstrap new members ● Strong write concerns can be specified for critical data 37
Hardware: CPUs ● Cores vs Core Speed ○ Lots of cores > faster cores (4 CPU minimum recommended) ○ Thread-per-connection Model ● CPU Frequency Scaling ○ ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency ○ Terrible idea for databases or any predictability! ○ Disable or set governor to 100% frequency always, i.e mode: ‘performance’ ○ Disable any BIOS-level performance/efficiency tuneable ○ ENERGY_PERF_BIAS ■ A CentOS/RedHat tuning for energy vs performance balance ■ RHEL 6 = ‘performance’ ■ RHEL 7 = ‘normal’ (!) ● My advice: use ‘tuned’ to set to ‘performance’ 38
Hardware: Network Infrastructure ● Datacenter Tiers ○ Network Edge ○ Public Server VLAN ■ Servers with Public NAT and/or port forwards from Network Edge ■ Examples: Proxies, Static Content, etc ■ Calls backends in Backend VLAN ○ Backend Server VLAN ■ Servers with port forwarding from Public Server VLAN (w/Source IP ACLs) ■ Optional load balancer for stateless backends ■ Examples: Webserver, Application Server/Worker, etc ■ Calls data stores in Data VLAN ○ Data VLAN ■ Servers, filers, etc with port forwarding from Backend Server VLAN (w/Source IP ACLs) ■ Examples: Databases, Queues, Filers, Caches, HDFS, etc 39
Hardware: Network Infrastructure ● Network Fabric ○ Try to use 10GBe for low latency ○ Use Jumbo Frames for efficiency ○ Try to keep all MongoDB nodes on the same segment ■ Goal: few or no network hops between nodes ■ Check with ‘traceroute’ ● Outbound / Public Access ○ Databases don’t need to talk to the internet* ■ Store a copy of your Yum, DockerHub, etc repos locally ■ Deny any access to Public internet or have no route to it ■ Hackers will try to upload a dump of your data out of the network!! ● Cloud? ○ Try to replicate the above with features of your provider 40
Hardware: Why So Quick? ● MongoDB allows you to scale reads and writes with more nodes ○ Single-instance performance is important, but deal-breaking ● You are the most expensive resource! ○ Not hardware anymore 41
Tuning MongoDB
Tuning MongoDB: MMAPv1 ● A kernel-level function to map file blocks to memory ● MMAPv1 syncs data to disk once per 60 seconds (default) Override with —syncDelay <seconds> flag ○ If a server with no journal crashes it can lose 1 min of ○ data!!! ● In memory buffering of Journal Synced every 30ms ‘journal’ is on a different disk ○ Or every 100ms ○ Or 1/3rd of above if change uses Journaled write ○ concern (explained later) 43
Tuning MongoDB: MMAPv1 ● Fragmentation Can cause serious slowdowns on scans, range ○ queries, etc db.<collection>.stats() ○ Shows various storage info for a collection ■ Fragmentation can be computed by dividing ■ ‘storageSize’ by ‘size’ Any value > 1 indicates fragmentation ■ Compact when you near a value of 2 by rebuilding ○ secondaries or using the ‘compact’ command WiredTiger and RocksDB have little-no fragmentation ○ due to checkpoints / compaction 44
Tuning MongoDB: WiredTiger ● WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes ○ ● In-memory buffering of Journal Journal buffer size 128kb ○ Synced every 50 ms (as of 3.2) ○ Or every change with Journaled write concern (explained ○ later) In between write operations while the journal records ○ remain in the buffer, updates can be lost following a hard shutdown! 45
Tuning MongoDB: RocksDB ● Level-based strategy using immutable data level files ○ Built-in Compression ○ Block and Filesystem caches ● RocksDB uses “compaction” to apply changes to data files Tiered level compaction ○ Follows same logic as MMAPv1 for journal ○ buffering ● MongoRocks ○ A layer between RocksDB and MongoDB’s storage engine API ○ Developed in partnership with Facebook 46
Tuning MongoDB: Storage Engine Caches ● WiredTiger ○ In heap ■ 50% available system memory ■ Uncompressed WT pages ○ Filesystem Cache ■ 50% available system memory ■ Compressed pages ● RocksDB ○ Internal testing planned from Percona in the future ○ 30% in-heap cache recommended by Facebook / Parse 47
Tuning MongoDB: Durability ● storage.journal.enabled = <true/false> Default since 2.0 on 64-bit builds ○ Always enable unless data is transient ○ Always enable on cluster config servers ○ ● storage.journal.commitIntervalMs = <ms> Max time between journal syncs ○ ● storage.syncPeriodSecs = <secs> Max time between data file flushes ○ 48
Tuning MongoDB: Don’t Enable! ● “cpu” ○ External monitoring is recommended ● “rest” ○ Will be deprecated in 3.6+ ● “smallfiles” ○ In most situations this is not necessary unless ■ You use MMAPv1, and ■ It is a Development / Test environment ■ You have 100s-1000s of databases with very little data inside (unlikely) ● Profiling mode ‘2’ ○ Unless troubleshooting an issue / intentional 49
Tuning Linux
Tuning Linux: The Linux Kernel ● Linux 2.6.x? ● Avoid Linux earlier than 3.10.x - 3.12.x ● Large improvements in parallel efficiency in 3.10+ (for Free!) ● More: https://blog.2ndquadrant.com/postgresql-vs-kernel-versions/ 51
Tuning Linux: NUMA ● A memory architecture that takes into account the locality of memory, caches and CPUs for lower latency ○ But no databases want to use it :( ● MongoDB codebase is not NUMA “aware”, causing unbalanced memory allocations on NUMA systems ● Disable NUMA In the Server BIOS ○ Using ‘numactl’ in init scripts BEFORE ‘mongod’ ○ command (recommended for future compatibility) : numactl --interleave=all /usr/bin/mongod <other flags> 52
Tuning Linux: Transparent HugePages ● Introduced in RHEL/CentOS 6, Linux 2.6.38+ ● Merges memory pages in background (Khugepaged process) ● Decreases overall performance when used with MongoDB! ● “AnonHugePages” in /proc/meminfo shows usage ● Disable TransparentHugePages! ● Add “transparent_hugepage=never” to kernel command-line (GRUB) Reboot the system ○ ■ Disabling online does not clear previous TH pages ■ Rebooting tests your system will come back up! 53
Tuning Linux: Time Source ● Replication and Clustering needs consistent clocks ○ mongodb_consistent_backup relies on time sync, for example! ● Use a consistent time source/server ○ “It’s ok if everyone is equally wrong” ● Non-Virtualized ○ Run NTP daemon on all MongoDB and Monitoring hosts ○ Enable service so it starts on reboot ● Virtualised ○ Check if your VM platform has an “agent” syncing time ○ VMWare and Xen are known to have their own time sync ○ If no time sync provided install NTP daemon 54
Tuning Linux: I/O Scheduler ● Algorithm kernel uses to commit reads and writes to disk ● CFQ “Completely Fair Queue” ○ Default scheduler in 2.6-era Linux distributions ○ Perhaps too clever/inefficient for database workloads ○ Probably good for a laptop ○ ● Deadline Best general default IMHO ○ Predictable I/O request latencies ○ ● Noop Use with virtualised servers ○ Use with real-hardware BBU RAID controllers ○ 55
Tuning Linux: Filesystems ● Filesystem Types Use XFS or EXT4, not EXT3 ○ EXT3 has very poor pre-allocation performance ■ Use XFS only on WiredTiger ■ EXT4 “data=ordered” mode recommended ■ Btrfs not tested, yet! ○ ● Filesystem Options Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’: ○ Remount the filesystem after an options change, or reboot ○ 56
Tuning Linux: Block Device Readahead ● Tuning that causes data ahead of a block on disk to be read and then cached ● Assumption: There is a sequential read pattern ○ Something will benefit from the extra cached blocks ○ ● Risk Too high waste cache space ○ Increases eviction work ○ MongoDB tends to have very random disk patterns ■ ● A good start for MongoDB volumes is a ’32’ (16kb) read-ahead ○ Let MongoDB worry about optimising the pattern 57
Tuning Linux: Block Device Readahead ● Change ReadAhead Add file to ‘/etc/udev/rules.d’ ○ /etc/udev/rules.d/60-mongodb-disk.rules: ■ # set deadline scheduler and 32/16kb read-ahead for /dev/sda ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16" Reboot (or use CLI tools to apply) ○ 58
Tuning Linux: Virtual Memory Dirty Pages ● Dirty Pages Pages stored in-cache, but needs to be written to ○ storage ● Dirty Ratio Max percent of total memory that can be dirty ○ VM stalls and flushes when this limit is reached ○ Start with ’10’, default (30) too high ○ ● Dirty Background Ratio Separate threshold forbackground dirty page ○ flushing Flushes without pauses ○ Start with ‘3’, default (15) too high ○ 59
Tuning Linux: Swappiness ● A Linux kernel sysctl setting for preferring RAM or disk for swap Linux default: 60 ○ To avoid disk-based swap: 1 (not zero!) ○ To allow some disk-based swap: 10 ○ ‘0’ can cause more swapping than ‘1’ on recent ○ kernels ■ More on this here: https://www.percona.com/blog/2014/04/28/oom-rela tion-vm-swappiness0-new-kernel/ 60
Tuning Linux: Ulimit ● Allows per-Linux-user resource constraints Number of User-level Processes ○ Number of Open Files ○ CPU Seconds ○ Scheduling Priority ○ And others… ○ ● MongoDB Should probably have a dedicated VM, container or server ○ Creates a new process ○ ■ For every new connection to the Database ■ Plus various background tasks / threads ○ Creates an open file for each active data file on disk ■ 64,000 open files and 64,000 max processes is a good start 61
Tuning Linux: Ulimit ● Setting ulimits ○ /etc/security/limits.d file ○ Systemd Service ○ Init script ● Ulimits are set by Percona and MongoDB packages! ○ Example on left: PSMDB RPM (Systemd) 62
Tuning Linux: Network Stack ● Defaults are not good for > 100mbps Ethernet ● Suggested starting point: ● Set Network Tunings: ○ Add the above sysctl tunings to /etc/sysctl.conf ○ Run “/sbin/sysctl -p” as root to set the tunings ○ Run “/sbin/sysctl -a” to verify the changes 63
Tuning Linux: More on this... https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/ 64
Tuning Linux: “Tuned” ● Tuned A “framework” for applying tunings to Linux ○ RedHat/CentOS 7 only for now ■ Debian added tuned, not sure if compatible yet ■ Cannot tune NUMA, file system type or fs mount opts ■ Syctls, THP, I/O sched, etc ■ My apology to the community for writing “Tuning Linux for MongoDB”: ● ○ https://github.com/Percona-Lab/tuned-percona-mongodb 65
Troubleshooting “The problem with troubleshooting is trouble shoots back” ~ Unknown
Troubleshooting: Usual Suspects ● Locking Collection-level locks ○ Document-level locks ○ Software mutex/semaphore ○ ● Limits Max connections ○ Operation rate limits ○ Resource limits ○ ● Resources Lack of IOPS, RAM, CPU, network, etc ○ 67
Troubleshooting: MongoDB Resources ● Memory ● CPU System CPU ○ FS cache ○ Networking ○ Disk I/O ○ Threading ○ ● User CPU (MongoDB) Compression (WiredTiger and RocksDB) ○ Session Managemen ○ BSON (de)serialisation ○ Filtering / scanning / sorting ○ 68
Troubleshooting: MongoDB Resources ● User CPU (MongoDB) Optimiser ○ Disk ○ Data file read/writes ○ Journaling ○ Error logging ○ Network ○ Query request/response ○ Replication ○ ● Disk I/O ○ Journaling ○ Oplog Reads / Writes ○ Background Flushing / Compactions / etc 69
Troubleshooting: MongoDB Resources ● Disk I/O ○ Page Faults (data not in cache) ○ Swapping ● Network ○ Client API ○ Replication ○ Sharding ■ Chunk Moves ■ Mongos -> Shards 70
Troubleshooting: db.currentOp() ● A function that dumps status info about running operations and various lock/execution details ● Only queries currently in progress are shown. ● Provided Query ID number can be used to kill long running queries. ● Includes ○ Original Query ○ Parsed Query ○ Query Runtime ○ Locking details ● Filter Documents ○ { "$ownOps": true } == Only show operations for the current user ○ https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples 71
Troubleshooting: db.stats() ● Returns ○ Document-data size (dataSize) ○ Index-data size (indexSize) ○ Real-storage size (storageSize) ○ Average Object Size ○ Number of Indexes ○ Number of Objects 72
Troubleshooting: db.currentOp() 73
Troubleshooting: Log File ● Interesting details are logged to the mongod/mongos log files Slow queries ○ Storage engine details (sometimes) ○ Index operations ○ Sharding ○ Chunk moves ■ Elections / Replication ○ Authentication ○ Network ○ Connections ■ ● Errors ● Client / Inter-node connections 74
Troubleshooting: Log File - Slow Query 2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } } keysExamined: 1 docsExamined: 1 nMatched: 1 nModified: 1 keysInserted: 1 keysDeleted: 1 numYields: 0 reslen: 604 locks: { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol: op_command 106ms 75
Troubleshooting: Operation Profiler ● Writes slow database operations to a new MongoDB collection for analysis Capped Collection “system.profile” in each database, default 1mb ○ The collection is capped, ie: profile data doesn’t last forever ○ ● Support for operationProfiling data in Percona Monitoring and Management in current future goals ● Enable operationProfiling in “slowOp” mode Start with a very high threshold and decrease it in steps ○ Usually 50-100ms is a good threshold ○ Enable in mongod.conf ○ operationProfiling: slowOpThresholdMs: 100 mode: slowOp 76
Troubleshooting: Operation Profiler ● Useful Profile Metrics op/ns/query: type, namespace and query of a profile ○ keysExamined: # of index keys examined ○ docsExamined: # of docs examined to achieve result ○ writeConflicts: # of Write Concern Exceptions ○ encountered during update numYields: # of times operation yielded for others ○ locks: detailed lock statistics ○ 77
Troubleshooting: .explain() ● Shows query explain plan for query cursors ● This will include ○ Winning Plan ■ Query stages ● Query stages may include sharding info in clusters ■ Index chosen by optimiser ○ Rejected Plans 78
Troubleshooting: .explain() and Profiler 79
Troubleshooting: Cluster Metadata ● The “config” database on Cluster Config servers ○ Use .find() queries to view Cluster Metadata ● Contains actionlog (3.0+) ○ changelog ○ databases ○ collections ○ shards ○ chunks ○ settings ○ mongos ○ locks ○ lockpings ○ 80
Troubleshooting: Percona PMM QAN ● The Query Analytics tool enables DBAs and developers to analyze queries over periods of time and find performance problems. ● Helps you optimise database performance by making sure that queries are executed as expected and within the shortest time possible. ● Central, web-based location for visualising data. ● Agent collected from MongoDB Profiler (required) from agent. ● Great for reducing access to systems while proving valueable data to development teams! ● Query Normalization ○ ie:“{ item: 123456 }” -> “{ item: ##### }”. ● Command-line Equivalent: pt-mongodb-query-digest tool 81
Troubleshooting: Percona PMM QAN 82
Troubleshooting: mlogfilter ● A useful tool for processing mongod.log files ● A log-aware replacement for ‘grep’, ‘awk’ and friends ● Generally focus on ○ mlogfilter --scan <file> ■ Shows all collection scan queries ○ mlogfilter --slow <ms> <file> ■ Shows all queries that are slower than X milliseconds ○ mlogfilter --op <op-type> <file> ■ Shows all queries of the operation type X (eg: find, aggregate, etc) ● More on this tool here https://github.com/rueckstiess/mtools/wiki/mlogfilter 83
Troubleshooting: Common Problems ● Sharding ○ removeShard Doesn’t Complete ■ Check the ‘dbsToMove’ array of the removeShard response mongos> db.adminCommand({removeShard:"test2"}) { "msg" : "draining started successfully", "state" : "started", "shard" : "test2", "note" : "you need to drop or movePrimary these databases", "dbsToMove" : [ "wikipedia" ], "ok" : 1 } ■ Why? mongos> use config switched to db config mongos> db.databases.find() { "_id" : "wikipedia", "primary" : "test2" , "partitioned" : true } 84
Troubleshooting: Common Problems ● Sharding ○ removeShard Doesn’t Complete ■ Try ● Use movePrimary to move database(s) Primary-role to others ● Run the removeShard command once the shard being removed is NOT primary for any database ○ This starts ths draining of the shard ● Run the same removeShard command to check on progress. ○ If the draining and removing is complete this will respond with success ○ Jumbo Chunks ■ Will prevent balancing from occurring ■ config.chunks collection document will contain jumbo:true as a key/value pair ■ Sharding ‘split’ commands can be used to reduce the chunk size (sh.splitAt, etc) ■ https://www.percona.com/blog/2016/04/11/dealing-with-jumbo-chunks-in-mongodb/ 85
Schema Design & Workflow “The problem with troubleshooting is trouble shoots back” ~ Unknown
Schema Design: Data Types ● Strings ○ Only use strings if required ○ Do not store numbers as strings! ○ Look for {field:“123456”} instead of {field:123456} ■ “12345678” moved to a integer uses 25% less space ■ Range queries on proper integers is more efficient ○ Example JavaScript to convert a field in an entire collection ■ db.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } ) }); 87
Schema Design: Data Types ● Strings ○ Do not store dates as strings! ■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space! ○ Do not store booleans as strings! ■ “true” -> true = 47% less space wasted ● DBRefs ○ DBRefs provide pointers to another document ○ DBRefs can be cross-collection ● NumberLong (3.4+) ○ Higher precision for floating-point numbers 88
Schema Design: Indexes ● MongoDB supports BTree, text and geo indexes Default behaviour ○ ● Collection lock until indexing completes ● {background:true} Runs indexing in the background avoiding pauses ○ Hard to monitor and troubleshoot progress ○ Unpredictable performance impact ○ ● Avoid drivers that auto-create indexes ○ Use real performance data to make indexing decisions, find out before Production! ● Too many indexes hurts write performance for an entire collection ● Indexes have a forward or backward direction ○ Try to cover .sort() with index and match direction! 89
Schema Design: Indexes ● Compound Indexes ○ Several fields supported ○ Fields can be in forward or backward direction ■ Consider any .sort() query options and match sort direction! ○ Composite Keys are read Left -> Right ■ Index can be partially-read ■ Left-most fields do not need to be duplicated! ■ All Indexes below are duplicates: ● {username: 1, status: 1, date: 1, count: -1} ● {username: 1, status: 1, data: 1} ● {username: 1, status: 1 } ● {username: 1 } ● Use db.collection.getIndexes() to view current Indexes 90
Schema Design: Query Efficiency ● Query Efficiency Ratios ○ Index: keysExamined / nreturned ○ Document: docsExamined / nreturned ● End goal: Examine only as many Index Keys/Docs as you return! ○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0) ! ○ Example: a query scanning 10 documents to return 1 has efficiency 0.1 ○ Scanning zero docs is possible if using a covered index! 91
Schema Workflow ● MongoDB optimised for single-document operations ● Single Document / Centralised Greate cache/disk-footprint efficiency ○ Centralised schemas may create a hotspot for write locking ○ ● Multi Document / Decentralised ○ MongoDB rarely stores data sequentially on disk ○ Multi-document operations are less efficient ○ Less potential for hotspots/write locking ○ Increased overhead due to fan-out of updates ○ Example: Social Media status update, graph relationships, etc ○ More on this later.. 92
Schema Workflow ● Read Heavy Workflow Read-heavy apps benefit from pre-computed results ○ Consider moving expensive reads computation to insert/update/delete ○ Example 1: An app does ‘count’ queries often ○ Move .count() read query to a summary document with counters ■ Increment/decrement single count value at write-time ■ Example 2: An app that does groupings of data ○ Move .aggregate() read query that is in-line to the user to a backend summary worker ■ Read from a summary collection, like a view ■ ● Write Heavy Workflow Reduce indexing as much as possible ○ Consider batching or a decentralised model with lazy updating (eg: social media ○ graph) 93
Schema Workflow ● Batching Inserts/Updates Requires less network commands ○ Allows the server to do some internal batching ○ Operations will be slower overall Suited for queue worker scenarios batching many changes Traditional user-facing database traffic should aim to operate on a single (or few) document(s) ● Thread-per-connection model ○ 1 x DB operation = 1 x CPU core only ○ Executing Parallel Reads ■ Large batch queries benefit from several parallel sessions ■ Break query range or conditions into several client->server threads ■ Not recommended for Primary nodes or Secondaries with heavy reads 94
Schema Workflow ● No list of fields specified in .find() ○ MongoDB returns entire documents unless fields are specified ○ Only return the fields required for an application operation! ○ Covered-index operations require only the index fields to be specified ● Using $where operators ○ This executes JavaScript with a global lock ● Many $and or $or conditions ○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently ○ Try to avoid this sort of model with ■ Data locality ■ Background Summaries / Views 95
Fan Out / Fan In ● Fan-Out Systems ○ Decentralised ○ Data is eventually written in many locations ○ Complex write path (several updates) ■ Good use-case for Queue/Worker model ■ Batching possible ○ Simple read path (data locality) ● Fan-In ○ Centralised ○ Simple Write path ■ Possible Write locking ○ Complex Read Path ■ Potential for latency due to network 96
Data Integrity “The problem with troubleshooting is trouble shoots back” ~ Unknown
Data Integrity: `whoami` (continued) ● Very Paranoid ● Previous RDBMs ○ Online Marketing / Publishing ■ Paid for clicks coming in ■ Downtime = revenue + traffic (paid for) loss ○ Warehousing / Pricing SaaS ■ Store real items in warehouses/stores/etc ■ Downtime = many businesses (customers)/warehouses/etc at stand-still ■ Integrity problems = ● Orders shipped but not paid for 2010 ● Orders paid for but not shipped, etc ○ Moved on to Gaming, Percona ● So why MongoDB? 98
Data Integrity: Storage and Journaling ● The Journal provides durability in the event of failure of the server ● Changes are written ahead to the journal for each write operation ● On crash recovery, the server Finds the last point of consistency to disk ○ Searches the journal file(s) for the record matching the checkpoint ○ Applies all changes in the journal since the last point of consistency ○ ● Journal data is stored in the ‘journal’ subdirectory of the server data path (dbPath) ● Dedicated disks for data (random I/O) and journal (sequential I/O) improve performance 99
Data Integrity: Write Concern ● MongoDB Replication is Asynchronous ● Write Concerns Allow control of data integrity of a write to a Replica Set ○ ○ Write Concern Modes ■ “w: <num>” - Writes much acknowledge to defined number of nodes ■ “majority” - Writes much acknowledge on a majority of nodes ■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags ○ Durability ■ By default write concerns are NOT durable ■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation ■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set configuration! ● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable 100
Recommend
More recommend