Writing Datomic in Clojure Rich Hickey
Overview • What is Datomic? • Architecture • Implementation - Clojure Applied • Summary
What is Datomic? • A database • A sound model of information, with time • Provides database as a value to applications • Bring declarative programming to applications • Focus on reducing complexity
Why Datomic? • Architecture • Data Model
Architectures App App App App App App App App App App App App App App App App Server Server Queries Transactions Client-Server Client-Server Consistency Storage
Architectures App App App App App App App App App App App App App App App App Server Server Server Server Queries Transactions Clustered Client-Server Clustered Client-Server Consistency Storage
Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Transactions Sharded Client-Server Sharded Client-Server Consistency Storage
Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Transactions Transactions Sharded Client-Server Sharded Client-Server Consistency Consistency Storage Storage
Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Transactions Transactions K/V Store K/V Store Consistency Consistency Storage Storage
Architectures App App App App App App App App App App App App App App App App Server Server Server Server Server Server Queries Queries Queries Transactions Transactions Transactions K/V Store K/V Store Consistency Consistency Consistency Storage Storage Storage
Architectures App App App App App App App App App App App App App App App App Distributed Distributed Storage Storage Service Service Queries Queries Queries Transactions Transactions Transactions K/V Store K/V Store Consistency Consistency Consistency Storage Storage Storage
Datomic Architecture App App App App App App App App App App App App App App App App Distributed Distributed Storage Storage Service Service
Datomic Architecture App App App App App App App App App App App App App App App App Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage
Datomic Architecture App App App App App App App App App App App App App App App App Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Distributed Distributed Storage Storage Service Service Transactions Consistency Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage
Datomic Architecture App App App App App App App App App App App App App App App App Transactor Queries Queries Distributed Distributed Storage Storage Service Service Transactions Transactions Consistency Consistency Storage Storage
The Database, Deconstructed App Process App Process D Peer Lib App Data Query App Live Cache Data Index D Transactor Strings Result Sets Trans- DDL + DML Indexing actions cache Server Data Data Segments Trans- segments actions Indexing Storage Service I/O a,d,e b,c,e a,b,d Query Data Segments Storage T raditional DB Datomic
Designed for the Cloud • Ephemeral instances, unreliable disks • Redundancy in storage service • Leverages reliable storage services • e.g. DynamoDB, Riak
Elastic Scaling • More peers, more power • Fewer peers, less power, lower cost • Demand - driven • No configuration
Get Y our Own Brain • Query, communication and memory engine • Goes into your app, making it a peer • The db is e ff ectively local • Ad hoc, long running queries - ok
Logic • Declarative search and business logic • The query language is Datalog • Simple rules and data patterns • Joins are implicit, meaning is evident • db and non - db sources
Perception • Obtain a queue of transactions • not just your own • Query transactions for filtering/triggering
Consistency • ACID transactions add new facts • Database presented to app as a value • Data in storage service is immutable
Programmability • T ransactions/Rules/Queries/Results are data • Extensible types, predicates, etc • Queries can invoke your code
A Database of Facts • A single storage construct, the datom • Entity/Attribute/V alue/T ransaction • Attribute definition is the only 'schema'
Adaptability • Sparse, irregular, hierarchical data • Single and multi - valued attributes • No structural rigidity
Time Built - in • Every datom retains its transaction • T ransactions are totally ordered • T ransactions are first - class entities • Get the db as - of, or since, a point in time
Clojure Time Model Process events (pure functions) F F F v1 v2 v3 v4 States Identity (immutable values) (succession of states) Observers/perception/memory
Datomic Time Model Process events Transactions (pure functions) F F F v1 v2 v3 v4 States DB Values Identity (immutable values) DB (succession of Connection states) Observers/perception/memory Queries
Implementation
Datomic Architecture App Server Process App Server Process App Server Process Peer Lib Peer Lib Peer Lib App App Query App Query Query Live Comm Cache Live Index Comm Cache Live Index Comm Cache Index Data Segments memcached cluster (optional) Storage Service Transactor Transactor Trans- Data Segments Indexing Trans- actions Segment storage Redundant Indexing actions segment storage standby
State • Immutable, expanding value • Must be organized to support query
Index • Sorted set of facts • Maintaining sort live in storage - bad • BigTable - mem + storage merge • occasional merge into storage • persistent trees
Memory Index • New persistent sorted set • Large internal nodes • Pluggable comparators • 2 sorts always maintained • EAVT, AEVT • plus AVET, VAET
Storage • Log of tx asserts/retracts ( in tree ) • V arious covering indexes ( trees ) • Storage requirements • Data segment values ( K - >V ) • atoms ( consistent read ) • pods ( conditional put )
Index in Storage Identity Index ref T EAVT AEVT VeAET AVET Lucene 42 Index Root of key->dir Value dirs Sorted segs Datoms
What’s in a DB V alue? Identity Memory index (live window) db atom db value live Storage index history nextT asOfT sinceT Lucene index Storage-backed index live Lucene Roots EAVT AEVT VeAET t Value Hierarchical Cache
DB V alues • Time travel and more • db.asOf - past, db.since - windowed • db.with(tx) - speculative • db.filter(pred) - slice • dbs are arguments to query, not implicit • mock with datom - shaped data: [[:fred :likes "Pizza"] [:sally :likes "Ice cream"]]
Process • Assert/retract can’t express transformation • T ransaction function: (f db & args) - > tx - data • tx - data: assert|retract|(tx-fn args...) • Expand/splice until all assert/retracts
Process Expansion + + - foo - + + + + - bar baz - + + + + - + + + - + - + + ...
T ransactor • Accepts transactions • Expands, applies, logs, broadcasts • Periodic indexing, in background • Indexing creates garbage • Storage GC
T ransactor Implementation • HornetQ for transaction communication • Extensive internal pipelining - j.u.c. queues • Async message decompression • transaction expansion/application • encoding for, communication with storage • Java interop to storage APIs
Indexing • Extensive use of laziness • Parallel processing • Parallel I/O • Async, rejoins via queue
Recommend
More recommend