Distributed Collaborative Editing LSEQ: an Adaptive Distributed Sequence Data Structure On the Fly Order Preserving Object Renaming Achour Mostefaoui joint work with Emmanuel Desmontils, Pascal Molli and Brice N´ edelec
Distributed Collaborative Editors 1
Distributed Collaborative Editors 1 Across space, time, organizations. Distributed Collaborative editors Optimistic replication CRDT OT Google Docs CoVim 2
Distributed Collaborative Editors 1 Across space, time, organizations. Distributed Collaborative editors 2 Two phases : a locally prepare operations to send b execute remote operations Optimistic replication CRDT OT Google Docs CoVim 2
Distributed Collaborative Editors 1 Across space, time, organizations. Distributed Collaborative editors 2 Two phases : a locally prepare operations to send b execute remote operations Optimistic replication 3 Operational transform + local operations cheap CRDT OT – remote operations complex Google Docs CoVim 2
Distributed Collaborative Editors 1 Across space, time, organizations. Distributed Collaborative editors 2 Two phases : a locally prepare operations to send b execute remote operations Optimistic replication 3 Operational transform + local operations cheap CRDT OT – remote operations complex 4 Conflict-free Replicated Data Type 2 phases share computational cost Google Docs CoVim 2
Distributed Collaborative Editors 1 Across space, time, organizations. Distributed Collaborative editors 2 Two phases : a locally prepare operations to send b execute remote operations Optimistic replication 3 Operational transform + local operations cheap CRDT OT – remote operations complex 4 Conflict-free Replicated Data Type 2 phases share computational cost Google Docs CoVim ր collaborators ⇒ quadratic ր remote operations 2
Distributed Collaborative Editors A document can be seen as a sequence od basic elements (characters, words, lines, etc.). The problem is non trivial because it is necessary that the edition (updating of the document) ensures the following three properties (CCI) : 1 Convergence : the different copies need to converge to a same copy 3
Distributed Collaborative Editors A document can be seen as a sequence od basic elements (characters, words, lines, etc.). The problem is non trivial because it is necessary that the edition (updating of the document) ensures the following three properties (CCI) : 1 Convergence : the different copies need to converge to a same copy 2 Causality : any operation needs to reflect the operations that occurred causally before it 3
Distributed Collaborative Editors A document can be seen as a sequence od basic elements (characters, words, lines, etc.). The problem is non trivial because it is necessary that the edition (updating of the document) ensures the following three properties (CCI) : 1 Convergence : the different copies need to converge to a same copy 2 Causality : any operation needs to reflect the operations that occurred causally before it 3 Intention : the effect of an operation needs to meet the intention of the user that ordered it 3
CRDTs for sequences 1 Two commutative operations : Insert / delete Identify the basic elements The set of ids is totally ordered CRDTs sequence The ids make the sequence Variable-size Ids Tombstones WOOT Logoot Treedoc CT WOOTO RGA WOOTH Treedoc 4
CRDTs for sequences 1 Two commutative operations : Insert / delete Identify the basic elements The set of ids is totally ordered CRDTs sequence The ids make the sequence 2 The operations : insert ( p , elem , q ) Variable-size Ids Tombstones ⇒ basic function alloc ( p , q ) delete ( id elem ) WOOT id elem : immutable Logoot Treedoc CT WOOTO RGA WOOTH Treedoc 4
CRDTs for sequences 1 Two commutative operations : Insert / delete Identify the basic elements The set of ids is totally ordered CRDTs sequence The ids make the sequence 2 The operations : insert ( p , elem , q ) Variable-size Ids Tombstones ⇒ basic function alloc ( p , q ) delete ( id elem ) WOOT id elem : immutable Logoot Treedoc CT 3 Deleted elements are only WOOTO marked RGA ⇒ eventually needs purge WOOTH Treedoc 4
CRDTs for sequences 1 Two commutative operations : Insert / delete Identify the basic elements The set of ids is totally ordered CRDTs sequence The ids make the sequence 2 The operations : insert ( p , elem , q ) Variable-size Ids Tombstones ⇒ basic function alloc ( p , q ) delete ( id elem ) WOOT id elem : immutable Logoot Treedoc CT 3 Deleted elements are only WOOTO marked RGA ⇒ eventually needs purge WOOTH Treedoc 4 The size of identifiers may grow linearly wrt # operations 4 very fast depending on the use
Motivations Spectrum of two Wikipedia documents. 200 200 revision revision 180 180 160 160 140 140 n˚ revision n˚ revision 120 120 100 100 80 80 60 60 40 40 20 20 0 0 0 2000 4000 6000 8000 10000 12000 0 20 40 60 80 100 120 140 160 180 350 350 Logoot Logoot 300 300 250 250 id bit-size id bit-size 200 200 150 150 100 100 50 50 0 0 0 2000 4000 6000 8000 10000 12000 0 20 40 60 80 100 120 140 160 180 n˚ line n˚ line (a) Page edited in the end. ⇒ 169 . 7 (b) Page edited in front. ⇒ 172 . 25 bits/id. bits/id. 5
Motivations Spectrum of two Wikipedia documents. 200 200 revision revision 180 180 160 160 140 140 n˚ revision n˚ revision 120 120 100 100 80 80 60 60 40 40 20 20 0 0 0 2000 4000 6000 8000 10000 12000 0 20 40 60 80 100 120 140 160 180 350 350 Logoot Logoot 300 300 250 250 id bit-size id bit-size 200 200 150 150 100 100 50 50 0 0 0 2000 4000 6000 8000 10000 12000 0 20 40 60 80 100 120 140 160 180 n˚ line n˚ line (c) Page edited in the end. ⇒ 169 . 7 (d) Page edited in front. ⇒ 172 . 25 bits/id. bits/id. ⇒ Allocation strategies are CRUCIAL 5
Abstract Problem (1) Michel Eli Maurice Achour Yehuda 6
Abstract Problem (1) Michel Eli Maurice Achour Yehuda 100 011 010 001 000 n cards can be named using ids of size O (log n ) 7
Abstract Problem (1) Michel Eli Maurice Achour Yehuda 011 001 010 000 100 Even if one wants to preserve the order defined by the original names, n cards can be renamed with ids of size O (log n ) 8
Abstract Problem (2) Michel Maurice Achour Yehuda Eli 000 How about if the original names are not a priori known ? 9
Abstract Problem (2) Michel Maurice Yehuda Eli Achour 000 ??? One needs to have spare space (dense set of ids) 10
Abstract Problem (2) Michel Maurice Yehuda Eli Achour 100 000, 001 or 010 Is it possible to avoid all this loss of space ? 11
Bear confesses. . . 12
Problem 0 99 Variable-size identifier 10 11 14 15 A variable-size identifier id is a e g a sequence of numbers Begin f End id = [ p 1 . p 2 . . . p n ] which can 13 42 92 designate a path in a tree. b c d Problem statement Let D a document on which n insert operations have been performed. Let I ( D ) = { id | ( , id ) ∈ D} . The function alloc ( id p , id q ) should provide identifiers such as : | id | 2 � < O ( n ) n id ∈I | id | 2 means log 2 ( id ) aka. bit-length 13
Proposal : LSEQ Three components : base doubling, multiple allocation strategies, random strategy choice. Intuition As it is complex to predict the editing behaviour, some depths of the tree on a given path can be lost if the reward compensates the loss. In other terms, even if LSEQ chooses the wrong strategy at a given time, it will eventually choose the good one, and that choice will amortize the cost of all previous lost depths. 14
Base doubling Exponential trees : Under uniform distribution : Spatial complexity : O ( n log log n ). Where n the number of Ids. [ p 1 . p 2 . . . p n ] ⇒ | p n | 2 = | p n − 1 | 2 + 1 . Where | p 1 | = base + 1 bit ⇒ x2 identifiers Intuition If the number of insert operations is low , the id bit-length can stay small . On the other hand, when the number of insertions increases , it is profitable to allocate larger identifiers. 15
Multiple allocation strategies boundary : + Good : page edited in the end. – Good : page edited in front. boundary+ boundary- +20 − 20 11 89 0 100 0 100 0 50 51 100 0 50 51 100 insertion insertion Intuition The allocation strategy boundary is not sufficient to be employed as a safe allocation strategy. However, by using its antagonist strategy, each strategy cancels the other’s deficiency . 16
Random strategy choice Unique strategy : not sufficient ⇒ Strategy choice : When ? Which ? Intuition : When The opening of a new space has a major meaning : Either the allocation strategy went wrong, or, on the opposite, a high number of insertions saturated the previous depths, meaning that it requires more space. Therefore, the space opening is an ideal moment to decide which strategy to employ. Intuition : Which Since it is impossible to a priori know the editing behaviour, the strategy choice should not favorize any behaviour . Consequently, the frequency of appearence of each strategies must be equal. 17
Synthesis : example Exponential tree Two allocation strategies : boundary+ and boundary– Random strategy choice Base Strategy boundary + 32 0 9 10 23 31 Begin End 64 boundary − 32 51 60 128 ??? 18
Recommend
More recommend