Disjoint-set data structure CS 5633 -- Spring 2008 (Union-Find) Problem: • Maintain a dynamic collection of pairwise-disjoint sets S = { S 1 , S 2 , …, S r }. • Each set S i has one element distinguished as the representative element, rep [ S i ]. • Must support 3 operations: • M AKE -S ET ( x ): adds new set { x } to S Union-Find Data Structures with rep [{ x }] = x (for any x ∉ S i for all i ) • U NION ( x , y ): replaces sets S x , S y with S x ∪ S y in S Carola Wenk (for any x , y in distinct sets S x , S y ) Slides courtesy of Charles Leiserson with small • F IND -S ET ( x ): returns representative rep [ S x ] changes by Carola Wenk of set S x containing element x 3/25/08 CS 5633 Analysis of Algorithms 1 3/25/08 CS 5633 Analysis of Algorithms 2 Disjoint-set data structure Union-Find Example (Union-Find) II The representative S = {} is underlined M AKE -S ET (2) S = {{2}} • In all operations pointers to the elements x , y S = {{2}, {3}} M AKE -S ET (3) in the data structure are given. S = {{2}, {3}, {4}} M AKE -S ET (4) • Hence, we do not need to first search for the F IND -S ET (4) = 4 element in the data structure. S = {{2, 4}, {3}} U NION (2, 4) F IND -S ET (4) = 2 • Let n denote the overall number of elements S = {{2, 4}, {3}, {5}} M AKE -S ET (5) (equivalently, the number of M AKE -S ET S = {{2, 4, 5}, {3}} operations). U NION (4, 5) 3/25/08 CS 5633 Analysis of Algorithms 3 3/25/08 CS 5633 Analysis of Algorithms 4 1
Simple linked-list solution Simple balanced-tree solution maintain how? Store each set S i = { x 1 , x 2 , …, x k } as an (unordered) Store each set S i = { x 1 , x 2 , …, x k } as a balanced tree doubly linked list. Define representative element (ignoring keys). Define representative element rep [ S i ] to be the front of the list, x 1 . rep [ S i ] to be the root of the tree. S i = { x 1 , x 2 , x 3 , x 4 , x 5 } … • M AKE -S ET ( x ) initializes x S i : x 1 x 2 x k Θ (1) as a lone node. rep [ S i ] x 1 rep [ S i ] • F IND -S ET ( x ) walks up the tree Θ (1) Θ (log n ) • M AKE -S ET ( x ) initializes x as a lone node. containing x until reaching root. x 4 x 3 • F IND -S ET ( x ) walks left in the list containing • U NION ( x , y ) calls F IND -S ET on Θ ( n ) x until it reaches the front of the list. Θ (log n ) y, finds a leaf of x and • U NION ( x , y ) calls F IND -S ET on y, finds the Θ ( n ) x 2 x 5 concatenates both trees, last element of list x , and concatenates both changing rep. of y lists, leaving rep. as F IND -S ET [ x ]. How? 3/25/08 CS 5633 Analysis of Algorithms 5 3/25/08 CS 5633 Analysis of Algorithms 6 Plan of attack Augmented linked-list solution Store S i = { x 1 , x 2 , …, x k } as unordered doubly linked list. •We will build a simple disjoint-union data structure Augmentation: Each element x j also stores pointer that, in an amortized sense , performs significantly rep [ x j ] to rep [ S i ] (which is the front of the list, x 1 ). better than Θ (log n ) per op., even better than Θ (log log n ), Θ (log log log n ), ..., but not quite Θ (1). rep •To reach this goal, we will introduce two key tricks . … S i : x 1 x 2 x k Each trick converts a trivial Θ ( n ) solution into a rep [ S i ] simple Θ (log n ) amortized solution. Together, the – Θ (1) • F IND -S ET ( x ) returns rep [ x ]. two tricks yield a much better solution. • U NION ( x , y ) concatenates lists containing • First trick arises in an augmented linked list. x and y and updates the rep pointers for Second trick arises in a tree structure. – Θ ( n ) all elements in the list containing y . 3/25/08 CS 5633 Analysis of Algorithms 7 3/25/08 CS 5633 Analysis of Algorithms 8 2
Example of Example of augmented linked-list solution augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) U NION ( x , y ) • concatenates the lists containing x and y , and • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the • updates the rep pointers for all elements in the list containing y . list containing y . rep S x ∪ S y : rep S x : x 1 x 2 x 1 x 2 rep rep rep [ S x ] rep [ S x ] S y : y 1 y 2 y 3 y 1 y 2 y 3 rep [ S y ] rep [ S y ] 3/25/08 CS 5633 Analysis of Algorithms 9 3/25/08 CS 5633 Analysis of Algorithms 10 Example of Alternative concatenation augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) could instead U NION ( x , y ) • concatenate the lists containing y and x , and • concatenates the lists containing x and y , and • update the rep pointers for all elements in the • updates the rep pointers for all elements in the list containing x . list containing y . rep rep S x ∪ S y : S x : x 1 x 2 x 1 x 2 rep rep [ S x ∪ S y ] rep [ S x ] S y : y 1 y 2 y 3 y 1 y 2 y 3 rep [ S y ] 3/25/08 CS 5633 Analysis of Algorithms 11 3/25/08 CS 5633 Analysis of Algorithms 12 3
Alternative concatenation Alternative concatenation U NION ( x , y ) could instead U NION ( x , y ) could instead • concatenate the lists containing y and x , and • concatenate the lists containing y and x , and • update the rep pointers for all elements in the • update the rep pointers for all elements in the list containing x . list containing x . rep rep x 1 x 2 x 1 x 2 rep rep S x ∪ S y : S x ∪ S y : rep [ S x ] y 1 y 2 y 3 y 1 y 2 y 3 rep [ S x ∪ S y ] rep [ S y ] 3/25/08 CS 5633 Analysis of Algorithms 13 3/25/08 CS 5633 Analysis of Algorithms 14 Analysis of Trick 1 Trick 1 : Smaller into larger (weighted-union heuristic) (weighted-union heuristic) To save work, concatenate smaller list onto the end Theorem: Total cost of U NION ’s is O( n log n ). of the larger list. Cost = Θ (length of smaller list). Proof. • Monitor an element x and set S x containing it. Augment list to store its weight (# elements). • After initial MAKE-SET( x ), weight [ S x ] = 1. • Let n denote the overall number of elements • Each time S x is united with S y : • if weight [ S y ] ≥ weight [ S x ]: (equivalently, the number of M AKE -S ET operations). – pay 1 to update rep [ x ], and • Let m denote the total number of operations. – weight [ S x ] at least doubles (increases by weight [ S y ]). • Let f denote the number of F IND -S ET operations. • if weight [ S y ] < weight [ S x ]: Theorem: Cost of all U NION ’s is O( n log n ). – pay nothing, and – weight [ S x ] only increases. Corollary: Total cost is O( m + n log n ). Thus pay ≤ log n for x . 3/25/08 CS 5633 Analysis of Algorithms 15 3/25/08 CS 5633 Analysis of Algorithms 16 4
Disjoint set forest: Trick 1 adapted to trees Representing sets as trees Store each set S i = { x 1 , x 2 , …, x k } as an unordered, • U NION ( x , y ) can use a simple concatenation strategy: potentially unbalanced, not necessarily binary tree, Make root F IND -S ET ( y ) a child of root F IND -S ET ( x ). ⇒ F IND -S ET ( y ) = F IND -S ET ( x ). storing only parent pointers. rep [ S i ] is the tree root. x 1 • Adapt Trick 1 to this context: • M AKE -S ET ( x ) initializes x S i = { x 1 , x 2 , x 3 , x 4 , x 5 , x 6 } Union-by-weight: – Θ (1) as a lone node. x 4 x 3 y 1 Merge tree with smaller rep [ S i ] x 1 • F IND -S ET ( x ) walks up the weight into tree with tree containing x until it x 2 x 5 x 6 y 4 y 3 larger weight. – Θ ( depth [ x ]) reaches the root. x 4 x 3 • U NION ( x , y ) calls F IND -S ET twice • Variant of Trick 1 (see book): y 2 y 5 and concatenates the trees x 2 x 5 x 6 Union-by-rank: – Θ ( depth [ x ]) containing x and y … rank of a tree = its height 3/25/08 CS 5633 Analysis of Algorithms 17 3/25/08 CS 5633 Analysis of Algorithms 18 Trick 1 adapted to trees Trick 2 : Path compression (union-by-weight) • Height of tree is logarithmic in weight, because: When we execute a F IND -S ET operation and walk • Induction on n up a path p to the root, we know the representative • Height of a tree T is determined by the two subtrees T 1 , T 2 that T has been united from. for all the nodes on path p . x 1 • Inductively the heights of T 1 , T 2 are the logs of their weights. Path compression makes • If T 1 and T 2 have different heights: x 4 x 3 all of those nodes direct y 1 height( T ) = max(height( T 1 ), height( T 2 )) children of the root. = max(log weight( T 1 ), log weight( T 2 )) x 2 x 5 x 6 y 4 y 3 < log weight( T ) Cost of F IND -S ET ( x ) • If T 1 and T 2 have the same heights: is still Θ ( depth [ x ]). (Assume 2 ≤ weight( T 1 )<weight( T 2 ) ) y 2 y 5 F IND -S ET ( y 2 ) height( T ) = height( T 1 ) + 1 ≤ 2* log weight( T 1 ) ≤ log weight( T ) • Thus the total cost of any m operations is O( m log n ). 3/25/08 CS 5633 Analysis of Algorithms 19 3/25/08 CS 5633 Analysis of Algorithms 20 5
Recommend
More recommend