ToroDB Stampede: Open-source BI connector for MongoDB lvaro - PowerPoint PPT Presentation

ToroDB Stampede: Open-source BI connector for MongoDB Álvaro Hernández Tortosa <aht@8kdata.com> Percona live 2017

Who I am CEO, 8Kdata.com • What we do @8Kdata: ALVARO HERNANDEZ ✓ Creators of ToroDB.com, NoSQL & SQL database ✓ Database R&D, product development ✓ Training and consulting in PostgreSQL ✓ PostgreSQL Support Twitter: @ahachete Linkedin: Founder, President Spanish Postgres User Group http://es.linkedin.com/in/alvarohernandeztortosa/ postgrespana.es ~ 750 members

MongoDB Connector for BI

MongoDB BI Connector •https://docs.mongodb.com/bi- connector/master/ •Developed by MongoDB themselves •Effectively allows to query mongo in a SQL-ish way But...

MongoDB BI Connector •How do they do it? You don’t know it. •Harder to detect bugs •They’re part of bigger solutions which may or may not fit you •You need Enterprise version, you are paying for the whole solution

MongoDB does *not* have consistent reads https://blog.meteor.com/mongodb-queries-dont-always-return-all-matching-documents-654b6594a827#.fplxodagr

ToroDB Stampede

ToroDB: from MongoDB to PostgreSQL

How do we generate the schema? ★ There’s no need to generate it, it’s implicit! { name: “Alice”, address: { street: “Wonderland”, number: 42 }, hobbies: [ “reading”, “talking to rabbits” ] } implicit schema

How do we generate the schema? •ToroDB analyzes every incoming document and separates metadata (schema) from data (tuples) •With that schema info, ToroDB creates 1+ tables per MongoDB collection •ToroDB also creates a RDBMS catalog schema per MongoDB database

How do we generate the schema? SELECT * FROM torodb.people; database: torodb did name_s _id_x collection: people { 1 Alice \x01d4a63dc6944f37a61cf3d6 name: “Alice”, _id: (implicit val), SELECT * FROM torodb.people_address; address: { street: did name_s seq number_d street_s “Wonderland”, number: 42 1 1 42 Wonderland }, hobbies: [ SELECT * FROM torodb.people_hobbies; “reading”, “talking to did rid seq value_s rabbits” 1 1 0 “reading” ] } 1 2 1 “talking to rabbits”

How to react to dynamic changes in the schema? [ { name: “Alice” }, { name: “Harry”, surname: “Potter” } ] ★ What happens if a new document in the collection brings new fields?

How to react to dynamic changes in the schema? did _id_x name_s surname_s 1 \x01d4a63dc6944f37a61cf3d6 Alice 2 \x21f4f563dc6944f37a61cf436 Harry Potter ★ ToroDB automatically performs: ✓ ALTER TABLE ... ADD COLUMN surname_s text

How to react to dynamic changes in the schema? [ { name: “Harry”, surname: “Potter” } { name: “Henry”, surname: 8 }, ] ★ What happens if a new document in the collection brings type conflicts?

How to react to dynamic changes in the schema? did _id_x name_s surname_s surname_i 1 \x01d4a63dc6944f37a61cf3d6 Harry Potter 2 \x21f4f563dc6944f37a61cf436 Henry 8 ★ Type conflicts are resolved by suffixing column names with the type of the data: "s" for text, "i" for integer and so forth

ToroDB Architecture This is your SQL replica! ToroDB ToroDB works as a secondary node on a MongoDB replica set

ToroDB Architecture • ToroDB works as a secondary node on a replica set • This node has two main modes: ✓ First, it imports all MongoDB data into PostgreSQL ✓ Then it stays tuned to changes in MongoDB Oplog and applies them in PostgreSQL

ToroDB Architecture Protocols MongoDB*, Couchbase, Cassandra… Abstract Doc Tables and Rows { Here is where ‘the magic’ happens transforming Docs into Data2Relational Backends PostgreSQL*, MySQL, SQL Server, Oracle… * currently available ToroDB offers the interface of any NoSQL database, but persisting data in tables and rows within a SQL database.

Measuring I/O with iotop required to answer a query Github Archive: top 10 actors (1,4GB dataset) 500 MongoDB storageSize: 536.37 MB MongoDB size: 1410,35 MB 375 Exactly 100% of the storageSize! db.githubarchive.aggregate([ Disk Read (MB) { $group: { _id: '$actor.login', events: { $sum: 1 } 536,36 250 } }, { $sort: { events: -1 }}, { $limit: 10 } ]) 125 SELECT count(*), login FROM actor GROUP BY login 87,93 6,5 ORDER BY 1 DESC LIMIT 10; 0 MongoDB PostgreSQL PostgreSQL + cstore

Building the benchmarks • Data from Github archive • Timing three different environments that contain the same data: ✓ MongoDB ✓ MongoDB with 3 shards ✓ PostgreSQL (obtained by migrating MongoDB environment via ToroDB) •We run 6 queries (PostgreSQL queries being the relational equivalent to MongoDB queries) ✓ Available at https://gist.github.com/ahachete

Github 500GB PostgreSQL MongoDB MongoDB (3 Shards) 880 A 1.851 621 83 B 4.333 1.184 57x 239 C 4.253 1.191 254 D 14.504 3.691 1.650 E 4.732 1.418 402 F 4.126 1.021 0 4.000 8.000 12.000 16.000 Seconds (Less is better)

Github 100GB PostgreSQL MongoDB MongoDB (3 Shards) 34 A 361 89 13 B 750 That’s 267x 194 faster! 38 C 747 163 9 D 2.405 383 240 E 768 193 53 F 664 117 0 750 1.500 2.250 3.000 Seconds (Less is better)

Github 100GB no idx PostgreSQL MongoDB MongoDB (3 Shards) 31 A 715 199 13 B 694 209 Important 37 because 94x C 668 many BI 193 queries 7 don’t use D 658 indexes 158 241 E 793 211 53 F 667 164 0 200 400 600 800 Seconds (Less is better)

What's on my MongoDB database?

Stampede infers and clearly shows you your data

Or use SQL tools to show your schema

Installation Prerequisites ToroDB Stampede MongoDB connector for BI • PostgreSQL   • MongoDB Enterprise 3.2+   • Java SE 8 • OpenSSL

Installation MongoDB connector for BI wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64- ubuntu1604-3.4.3.tgz tar xjf mongodb-linux-x86_64-ubuntu1604-3.4.3.tgz sudo install -m755 bin/mongo* /usr/local/bin/ ToroDB Stampede wget https://www.torodb.com/download/torodb-stampede-1.0.0- beta1.tar.bz2 tar xjf torodb-stampede-1.0.0-beta1.tar.bz2

Configuration MongoDB connector for BI ... It’s something like this schema: • You have to create an schema for - db: database each database (or collection). tables: - table: name_table collection: name_collection pipeline: [] mongodrdl --host example.com columns: -d dbname [-c collname] - Name: _id MongoType: bson.ObjectId -o schema.drdl SqlName: _id SqlType: varchar • You can edit manually… - Name: column_name MongoType: type SqlName: sqlname SqlType: sqltype

Configuration ToroDB Stampede • Requires a new user ( ‘torodb' ) and a new database ( ’torod' ) in PostgreSQL. • Access configuration to PostgreSQL detailed in .toropass : host:port:torod:torodb:<password> • You can also check the configuration used by ToroDB Stampede using the -l parameter: torodb-stampede -l

Start! MongoDB connector for BI mongosqld --schema schema.drdl --mongo-uri <your.mongohost.com> • You need one schema per database • It enables a MySQL connection ToroDB Stampede $TOROHOME/bin/torodb-stampede • And it starts the automagically replication

Problem resolution Insert Documents with new fields MongoDB connector for BI ToroDB Stampede • You will surely have to edit • Creates new columns and manually the mongodrdl tables if it’s necessary schema: ➡ mongodrdl infers the schema by sampling

Problem resolution Type Conflicts MongoDB connector for BI ToroDB Stampede • You should execute mongodrdl • The column names contain again, but surely you don’t have a postfix, indicating the the new type, so: data type ➡ You have to edit the schema ➡ If one path contains and add a new field with two different data types different sql name. then two different columns are created.

Problem resolution Array / Document exploding MongoDB Connector for BI database: stampede collection: restaurants stampede.restaurants { _id name address.street address.number _id: (implicit val), 01d4a63 “Glorious Food” “Wonderland” 42 name: “Glorious Food” dc6944f address: { street:”Wonderland”, number: 42, coord: [ -73.9832324, 40.6563633] stampede.restaurants_address_coord } _id address.coord address.coord_idx } 01d4a63 -73.9832324 0 dc6944f 01d4a63 40.6563633 1 dc6944f

Problem resolution Array / Document exploding ToroDB Stampede stampede.restaurants database: stampede did _id_x name_s address_e collection: restaurants 0 \x01d4a “Glorious Food” f { _id: (implicit val), name: “Glorious Food” address: { stampede.restaurants_address street:”Wonderland”, did rid seq street_s number_d coord_e number: 42, 0 0 “Wonderland” 42 t coord: [ -73.9832324, 40.6563633] } stampede.restaurants_address_coord } did rid pid seq v_d 0 0 0 0 -73,9832324 0 1 0 1 40.6563633

Use Case with… PostgreSQL 100.000 rows M 1.008 ms o n g o 1 D 0 B 0 B . 0 I 0 0 8 r o 2 w . 4 s 1 2 m ( s 8 0 x ! ! ) SELECT * FROM twitter.message INNER JOIN twitter.message_user ON SELECT * FROM `twitter.message` twitter.message.did = LIMIT 100000 twitter.message_user.did LIMIT 100000

ToroDB Stampede: Open-source BI connector for MongoDB lvaro - PowerPoint PPT Presentation

ToroDB Stampede: Open-source BI connector for MongoDB lvaro Hernndez Tortosa <aht@8kdata.com> Percona live 2017 Who I am CEO, 8Kdata.com What we do @8Kdata: ALVARO HERNANDEZ Creators of ToroDB.com, NoSQL & SQL

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Mylyn Industrial Connector Mylyn Industrial Connector 1 Why yet another connector? Mylyn is

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

SR 874/Don Shula Expressway SR 874/Don Shula Expressway Ramp Connector Ramp Connector Ramp

Consumer Connector Collaboration May 16, 2017 Webinar Consumer Connector Collaboration Agenda

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

ALTERNATING PATHS AND CYCLES OF MINIMUM LENGTH William Evans University of British Columbia,

Duality by Example Duality by Example: II max z = 4 x 1 + x 2 + 3 x 3 max z = 4 x 1 + x 2 + 3 x

An Object-Oriented World David Van Horn Background & Motivation The first year Fall

Business Intelligence Matteo Francia , Matteo Golfarelli, Stefano Rizzi DISI University of

PARTNER 3 Transcatheter or Surgical Aortic Valve Replacement in Low Risk Patients with Aortic

Efficient Modular NIZK Arguments from Shift and Product Speaker: Bingsheng Zhang 1 Joint work with

Game Theory Strategic Form Games Levent Ko ckesen Ko c University Levent Ko ckesen

Verifiable Auctions for Online Ad Exchanges Sebastian Angel and Michael Walfish The University of

ToroDB Stampede: Open-source BI connector for MongoDB lvaro - PowerPoint PPT Presentation

ToroDB Stampede: Open-source BI connector for MongoDB lvaro Hernndez Tortosa <aht@8kdata.com> Percona live 2017 Who I am CEO, 8Kdata.com What we do @8Kdata: ALVARO HERNANDEZ Creators of ToroDB.com, NoSQL & SQL

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Mylyn Industrial Connector Mylyn Industrial Connector 1 Why yet another connector? Mylyn is

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

SR 874/Don Shula Expressway SR 874/Don Shula Expressway Ramp Connector Ramp Connector Ramp

Consumer Connector Collaboration May 16, 2017 Webinar Consumer Connector Collaboration Agenda

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

ALTERNATING PATHS AND CYCLES OF MINIMUM LENGTH William Evans University of British Columbia,

Duality by Example Duality by Example: II max z = 4 x 1 + x 2 + 3 x 3 max z = 4 x 1 + x 2 + 3 x

An Object-Oriented World David Van Horn Background &amp; Motivation The first year Fall

Business Intelligence Matteo Francia , Matteo Golfarelli, Stefano Rizzi DISI University of

PARTNER 3 Transcatheter or Surgical Aortic Valve Replacement in Low Risk Patients with Aortic

Efficient Modular NIZK Arguments from Shift and Product Speaker: Bingsheng Zhang 1 Joint work with

Game Theory Strategic Form Games Levent Ko ckesen Ko c University Levent Ko ckesen

Verifiable Auctions for Online Ad Exchanges Sebastian Angel and Michael Walfish The University of

An Object-Oriented World David Van Horn Background & Motivation The first year Fall