writing datomic in clojure
play

Writing Datomic in Clojure Rich Hickey Datomic, Clojure Overview - PowerPoint PPT Presentation

Writing Datomic in Clojure Rich Hickey Datomic, Clojure Overview What is Datomic? Architecture Implementation - Clojure Applied Summary What is Datomic? A new kind of database Bringing data power into the application


  1. Writing Datomic in Clojure Rich Hickey Datomic, Clojure

  2. Overview • What is Datomic? • Architecture • Implementation - Clojure Applied • Summary

  3. What is Datomic? • A new kind of database • Bringing data power into the application • A sound model of information, with time • Enabled by architectural advances

  4. Why Datomic? • Architecture • Data Model

  5. Architectures App App App App App App App App App App App App App App App App Server Server Queries Transactions Client-Server Client-Server Consistency Storage

  6. Architectures App App App App App App App App App App App App App App App App Server Server Server Server Queries Transactions Clustered Client-Server Clustered Client-Server Consistency Storage

  7. Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Transactions Sharded Client-Server Sharded Client-Server Consistency Storage

  8. Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Transactions Transactions Sharded Client-Server Sharded Client-Server Consistency Consistency Storage Storage

  9. Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Transactions Transactions K/V Store K/V Store Consistency Consistency Storage Storage

  10. Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Queries Transactions Transactions Transactions K/V Store K/V Store Consistency Consistency Consistency Storage Storage Storage

  11. Architectures App App App App App App App App App App App App App App App App Distributed Distributed Storage Storage Service Service Queries Queries Queries Transactions Transactions Transactions K/V Store K/V Store Consistency Consistency Consistency Storage Storage Storage

  12. Datomic Architecture App App App App App App App App App App App App App App App App Distributed Distributed Storage Storage Service Service

  13. Datomic Architecture App App App App App App App App App App App App App App App App Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage

  14. Datomic Architecture App App App App App App App App App App App App App App App App Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage

  15. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage

  16. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage

  17. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage

  18. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage

  19. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage

  20. Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage

  21. The Database, Deconstructed App Process App Process D Peer Lib App Data Query App Live Cache Data Index D Transactor Strings Result Sets Trans- DDL + DML Indexing actions cache Server Data Data Segments Trans- segments actions Indexing Storage Service I/O a,d,e b,c,e a,b,d Query Data Segments Storage T raditional DB Datomic

  22. Designed for the Cloud • Ephemeral instances, unreliable disks • Redundancy in storage service • Leverages reliable storage services • e.g. DynamoDB

  23. Elastic Scaling • More peers, more power • Fewer peers, less power, lower cost • Demand - driven • No configuration

  24. Get Y our Own Brain • Query, communication and memory engine • Goes into your app, making it a peer • The db is e ff ectively local • Ad hoc, long running queries - ok

  25. Logic • Declarative search and business logic • The query language is Datalog • Simple rules and data patterns • Joins are implicit, meaning is evident • db and non - db sources

  26. Perception • Obtain a queue of transactions • not just your own • Query transactions for filtering/triggering

  27. Consistency • ACID transactions add new facts • Database presented to app as a value • Data in storage service is immutable

  28. Programmability • T ransactions/Rules/Queries/Results are data • Extensible types, predicates, etc • Queries can invoke your code

  29. A Database of Facts • A single storage construct, the datom • Entity/Attribute/V alue/T ransaction • Attribute definition is the only 'schema'

  30. Adaptability • Sparse, irregular, hierarchical data • Single and multi - valued attributes • No structural rigidity

  31. Time Built - in • Every datom retains its transaction • T ransactions are totally ordered • T ransactions are first - class entities • Get the db as - of, or since, a point in time

  32. Implementation

  33. Architecture App Process Peer Lib App Query Live Comm Cache Index Data Segments Storage Service (Dynamo DB) Transactor AMI Trans- Data Segments Indexing actions Redundant SSD SSD segment storage

  34. State • Immutable, expanding value • Must be organized to support query • Sorted set of facts • Maintaining sort live in storage - bad • BigTable - mem + storage merge • occasional merge into storage • persistent trees

  35. Memory Index • New persistent sorted set • Large internal nodes • Pluggable comparators • 2 sorts always maintained • EAVT, AEVT • plus AVET, VAET

  36. Storage • Log of tx asserts/retracts ( in tree ) • V arious covering indexes ( trees ) • Storage requirements • Data segment values ( K - >V ) • atoms ( consistent read ) • pods ( conditional put )

  37. Index Storage T EAVT AEVT VeAET AVET Lucene 42 Index Root of key->dir dirs Sorted segs Datoms Storage Service

  38. What’s in a DB V alue? Memory index (live window) db atom db value live Storage being-indexed index history keys id->key ids key->id asOfT Storage-backed index sinceT Lucene index Roots live Lucene EAVT AEVT VeAET t Hierarchical Cache

  39. Process • Assert/retract can’t express transformation • T ransaction function: ( f db & args ) - > tx - data • tx - data: assert|retract| ( tx - fn args... ) • Expand/splice until all assert/retracts

  40. Process Expansion + + - foo - + + + + - bar baz - + + + + - + + + - + - + + ...

  41. T ransactor • Accepts transactions • Expands, applies, logs, broadcasts • Periodic indexing, in background • Indexing creates garbage • Storage GC

  42. T ransactor Implementation • HornetQ for transaction communication • Extensive internal pipelining - j.u.c. queues • Async message decompression • transaction expansion/application • encoding for, communication with storage • Java interop to storage APIs

  43. Indexing • Extensive use of laziness • Parallel processing • Parallel I/O • Async, rejoins via queue

  44. Declarative Programming • Embedded Datalog • Takes data sources and rule sets as args • Extended to work with scalars/collections • Expression clauses call your code

  45. Datalog Implementation • Data driven, in and out • Query/Subquery Recursive ( QSQR ) • Dynamic, set oriented • DB joins leverage indexes • Expressions use Clojure compiler • caching of transforms at all stages

  46. Over Here • Peers directly access storage service • Have own query engine • Have live mem index and merging • Two - tier cache • Segments ( on/o ff heap ) • Datoms w/object values ( on heap )

  47. Peer Implementation • HornetQ for transaction communication • Google Guava caches • Java APIs for storage • Entities are like multimaps • key - > value ( s ) • reverse attrs

  48. Consistency and Scale • Process/writes go through transactor • traditional server scaling/availability • Immutability supports consistent reads • without transactions • scale reads turning knobs on storage • Query scales with peers • dynamic e.g. auto - scaling

Recommend


More recommend