THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment

STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21 • Data formats • Data storage supports • Data collection sizes • Data delivery mechanisms Exa 10 18 RAID Peta 10 15 Disk 2

DEALING WITH HUGE AMOUNTS OF DATA Relational Graph Yota 10 24 Key value Columns Cloud Zetta 10 21 Exa 10 18 RAID Concurrency Peta 10 15 Consistency Disk Atomicity 3

NOSQL STORES CHARACTERISTICS Simple operations ¡ Key lookups reads and writes of one record or a small number of ¡ records No complex queries or joins ¡ Ability to dynamically add new attributes to data records ¡ Horizontal scalability ¡ Distribute data and operations over many servers ¡ Replicate and distribute data over many servers ¡ No shared memory or disk ¡ High performance ¡ Efficient use of distributed indexes and RAM for data storage ¡ Weak consistency model ¡ Limited transactions ¡ Next generation databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable [http://nosql-database.org] 4

• • Data model Availability • • Consistency Query support • Storage • Durability Data stores designed to scale simple OLTP-style application loads Read/Write operations by thousands/millions of users 5

DATA MODELS ¡ Tuple Row in a relational table, where attributes are pre-defined in a schema, and the values are scalar ¡ ¡ Document Allows values to be nested documents or lists, as well as scalar values. ¡ Attributes are not defined in a global schema ¡ ¡ Extensible record Hybrid between tuple and document, where families of attributes are defined in a schema, but new attributes can be added ¡ on a per-record basis 6

DATA STORES Key-value ¡ Systems that store values and an index to find them, based on a key ¡ Document ¡ Systems that store documents, providing index and simple query mechanisms ¡ Extensible record ¡ Systems that store extensible records that can be partitioned vertically and horizontally across nodes ¡ Graph ¡ Systems that store model data as graphs where nodes can represent content modelled as document or key-value structures and arcs ¡ represent a relation between the data modelled by the node Relational ¡ Systems that store, index and query tuples ¡ 7

KEY-VALUE STORES ¡ “Simplest data stores” use a data model similar to S YSTEM A DDRESS the memcached distributed in-memory cache Redis code.google.com/p/redis ¡ Single key-value index for all data Scalaris code.google.com/p/scalaris ¡ Provide a persistence mechanism Tokyo tokyocabinet.sourceforge.net Voldemor project-voldemort.com ¡ Replication, versioning, locking, transactions, sorting t ¡ API: inserts, deletes, index lookups Riak riak.basho.com Membrain schoonerinfotech.com/products ¡ No secondary indices or keys Membase membase.com 8

SELECT name, pic, profile_url SELECT message, attachment FROM user FROM stream WHERE uid = me() WHERE source_id = me() AND type = 80 SELECT name FROM friendlist WHERE owner = me() SELECT name, pic FROM user SELECT name WHERE online_presence = "active" FROM group AND WHERE gid IN ( SELECT gid uid IN ( SELECT uid2 FROM group_member FROM friend WHERE uid = me() ) WHERE uid1 = me() ) https://developers.facebook.com/docs/reference/fql/ 9

<805114856, > 10

DOCUMENT STORES Support more complex data: pointerless objects, i.e., ¡ documents S YSTEM A DDRESS Secondary indexes, multiple types of documents ¡ (objects) per database, nested documents and lists, e.g. SimpleDB amazon.com/simpledb B-trees Couch DB couchdb.apache.org Automatic sharding (scale writes), no explicit locks, ¡ Mongo mongodb.org weaker concurrency (eventual for scaling reads) and DB atomicity properties Terrastor code.google.com/terrastore e API: select, delete, getAttributes, ¡ putAttributes on documents Queries can be distributed in parallel over multiple ¡ nodes using a map-reduce mechanism 11

DOCUMENT STORES 12

EXTENSIBLE RECORD STORES Basic data model is rows and columns ¡ Basic scalability model is splitting rows and columns over ¡ multiple nodes S YSTEM A DDRESS Rows split across nodes through sharding on the primary key ¡ HBase hbase.apache.com Split by range rather than hash function ¡ Rows analogous to documents: variable number of attributes, ¡ HyperTable hypertable.org attribute names must be unique Cassandra Grouped into collections (tables) incubator.apache.org/cassandra ¡ Queries on ranges of values do not go to every node ¡ Columns are distributed over multiple nodes using “column ¡ groups” Which columns are best stored together ¡ Column groups must be pre-defined with the extensible record ¡ stores 13

SCALABLE RELATIONAL SYSTEMS SQL: rich declarative query language ¡ Databases reinforce referential integrity ¡ S YSTEM A DDRESS ACID semantics ¡ Well understood operations: ¡ MySQL C mysql.com/cluster Configuration, Care and feeding, Backups, Tuning, Failure and recovery, ¡ Performance characteristics Volt DB voltdb.com Use small-scope operations ¡ Clustrix clustrix.com Challenge: joins that do not scale with sharding ¡ Use small-scope transactions ScaleDB ¡ scaledb.com ACID transactions inefficient with communication and 2PC overhead ¡ Scale Base scalebase.com Shared nothing architecture for scalability ¡ Nimbus DB nimbusdb.com Avoid cross-node operations ¡ 14

NOSQL DESIGN AND CONSTRUCTION PROCESS Memcached I NDEX Database Database Database Stored population querying organization Replicated Data reside in RAM (memcached) and is eventually replicated and stored ¡ Querying = designing a database according to the type of queries / map reduce model ¡ “On demand” data management: the database is virtually organized per view (external schema) on cache and some ¡ view are made persistent 15 An elastic easy to evolve and explicitly configurable architecture ¡

Use the right tool for the right job… How do I know which is the right tool for the right job? (Katsov-2012) 16

Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-management

REFERENCES ¡ Eric A., Brewer "Towards robust distributed systems." PODC. 2000 ¡ Rick, Cattell "Scalable SQL and NoSQL data stores." ACM SIGMOD Record 39.4 (2011): 12-27 ¡ Juan Castrejon, Genoveva Vargas-Solar, Christine Collet, and Rafael Lozano, ExSchema: Discovering and Maintaining Schemas from Polyglot Persistence Applications, In Proceedings of the International Conference on Software Maintenance, Demo Paper, IEEE, 2013 ¡ M. Fowler and P. Sadalage. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Pearson Education, Limited, 2012 ¡ C. Richardson, Developing polyglot persistence applications, http://fr.slideshare.net/chris.e.richardson/developing-polyglotpersistenceapplications- gluecon2013 18

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Equipe MARS - Mouvement et Action pour le Rtablissement Sanitaire Social et Citoyen Who are our

The Terre-en-vue Mouvement a case study P aleis der Academin_20120309 Maarten Roels Dept.

Prepared by: Abidine MERZOUGH IRA Section Europe I nitiative pour la R surgence du mouvement A

Lecture 7: Indexes and Database Tuning Wednesday, November 10, 2010 Dan Suciu -- CSEP544 Fall 1

Data and Process Modelling 4. Relational Mapping Marco Montali KRDB Research Centre for

Personell Kjell Orsborn, lecturer, examiner email: kjell.orsborn@it.uu.se, phone: 471

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&G ch. 20) Faloutsos

Distributed Databases: Design and Query Execution Data Fragmentation and Placement

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A.

Why Sort? Used for eliminating duplicates Select DISTINCT External Sorting Bulk

on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF - PowerPoint PPT Presentation

THE NOSQL MOUVEMENT GENOVEVA VARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/bigdata-managment STORING AND ACCESSING HUGE AMOUNTS OF DATA Yota 10 24 Cloud Zetta 10 21

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Zrich |

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

How to Use NoSQL in Enterprise Java Applications Patrick Baumgartner NoSQL Roadshow | Basel |

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using

1 2 What is covered in this presentation? A brief history of databases NoSQL WHY, WHAT

NoSQL Concepts, Techniques &amp; Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Concepts, Techniques &amp; Systems Part 2 Valentina Ivanova IDA, Linkping University

Why NoSQL? Why Riak? Justin Sheehy justin@basho.com 1 What's all of this NoSQL nonsense?

Equipe MARS - Mouvement et Action pour le Rtablissement Sanitaire Social et Citoyen Who are our

The Terre-en-vue Mouvement a case study P aleis der Academin_20120309 Maarten Roels Dept.

Prepared by: Abidine MERZOUGH IRA Section Europe I nitiative pour la R surgence du mouvement A

Lecture 7: Indexes and Database Tuning Wednesday, November 10, 2010 Dan Suciu -- CSEP544 Fall 1

Data and Process Modelling 4. Relational Mapping Marco Montali KRDB Research Centre for

Personell Kjell Orsborn, lecturer, examiner email: kjell.orsborn@it.uu.se, phone: 471

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&amp;G ch. 20) Faloutsos

Distributed Databases: Design and Query Execution Data Fragmentation and Placement

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A.

Why Sort? Used for eliminating duplicates Select DISTINCT External Sorting Bulk

on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of

NoSQL Concepts, Techniques & Systems Part 1 Valentina Ivanova IDA, Linkping University

NoSQL Concepts, Techniques & Systems Part 2 Valentina Ivanova IDA, Linkping University

15-415/615 - DB Applications Lecture #18: Physical Database Design (R&G ch. 20) Faloutsos