CMPS 2200 -- Fall 2012 Union-Find Data Structures Carola Wenk Slides courtesy of Charles Leiserson with small Slides courtesy of Charles Leiserson with small changes by Carola Wenk 10/29/12 CMPS 2200 Intro. to Algorithms 1
Disjoint-set data structure (Union Find) (Union-Find) Problem: • Maintain a dynamic collection of pairwise-disjoint Maintain a dynamic collection of pairwise disjoint sets S = { S 1 , S 2 , …, S r }. • Each set S i has one element distinguished as the representative element, rep [ S i ]. i l • Must support 3 operations: • M AKE -S ET ( x ): adds new set { x } to S t { } t S • M AKE S ET ( x ): dd with rep [{ x }] = x (for any x ∉ S i for all i ) • U NION ( x y ): replaces sets S S with S ∪ S in S U NION ( x , y ): replaces sets S x , S y with S x ∪ S y in S (for any x , y in distinct sets S x , S y ) • F IND -S ET ( x ): returns representative rep [ S x ] ( ) p p [ x ] of set S x containing element x 10/29/12 CMPS 2200 Intro. to Algorithms 2
Union-Find Example p S = {} The representative is underlined S = {{2}} S = {{2}} M AKE S ET (2) M AKE -S ET (2) S = {{2}, {3}} M AKE -S ET (3) S S = {{2}, {3}, {4}} {{2} {3} {4}} M M AKE -S ET (4) S (4) F IND -S ET (4) = 4 S S = {{2, 4}, {3}} {{2 4} {3}} U U NION (2, 4) (2 4) F IND -S ET (4) = 2 S = {{2, 4}, {3}, {5}} M AKE -S ET (5) S = {{2, 4, 5}, {3}} {{ } { }} U NION (4, 5) ( , ) 10/29/12 CMPS 2200 Intro. to Algorithms 3
Plan of attack •We will build a simple disjoint-set data structure that, in an amortized sense , performs significantly th t i ti d f i ifi tl better than Θ (log n ) per op., even better than Θ (log log n ) Θ (log log log n ) Θ (log log n ), Θ (log log log n ), ..., but not quite Θ (1). but not quite Θ (1) •To reach this goal, we will introduce two key tricks . Each trick converts a trivial Θ ( n ) solution into a i i l Θ ( ) E h i k l i i simple Θ (log n ) amortized solution. Together, the two tricks yield a much better solution two tricks yield a much better solution. • First trick arises in an augmented linked list. Second trick arises in a tree structure. 10/29/12 CMPS 2200 Intro. to Algorithms 4
Augmented linked-list solution g Store S i = { x 1 , x 2 , …, x k } as unordered doubly linked list. Augmentation: Each element x j also stores pointer Augmentation: Each element x j also stores pointer rep [ x j ] to rep [ S i ] (which is the front of the list, x 1 ). rep rep Assume pointer to x … S i : is given. g x 1 x 1 x 2 x 2 x k x k i rep [ S i ] – Θ (1) Θ (1) • F IND -S ET ( x ) returns rep [ x ]. • F IND S ET ( x ) returns rep [ x ] • U NION ( x , y ) concatenates lists containing x and y and updates the rep pointers for x and y and updates the rep pointers for – Θ ( n ) all elements in the list containing y . 10/29/12 CMPS 2200 Intro. to Algorithms 5
Example of augmented linked-list solution augmented linked list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U U NION ( x , y ) ( ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the • updates the rep pointers for all elements in the list containing y . rep rep S x : x 1 x 2 rep rep rep [ S x ] S y : y 1 y 2 y 3 y rep [ S y ] 10/29/12 CMPS 2200 Intro. to Algorithms 6
Example of augmented linked-list solution augmented linked list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U U NION ( x , y ) ( ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the • updates the rep pointers for all elements in the list containing y . S x ∪ S y : S ∪ S : rep rep x 1 x 2 rep rep rep [ S x ] y 1 y 2 y 3 rep [ S y ] 10/29/12 CMPS 2200 Intro. to Algorithms 7
Example of augmented linked-list solution augmented linked list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U U NION ( x , y ) ( ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the • updates the rep pointers for all elements in the list containing y . rep S ∪ S : S x ∪ S y : x 1 x 2 rep [ S x ∪ S y ] y 1 y 2 y 3 10/29/12 CMPS 2200 Intro. to Algorithms 8
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and t t th li t t i i d d • update the rep pointers for all elements in the list containing x list containing x . rep rep S x : x 1 x 2 rep rep rep [ S x ] S y : y 1 y 2 y 3 rep [ S y ] 10/29/12 CMPS 2200 Intro. to Algorithms 9
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and t t th li t t i i d d • update the rep pointers for all elements in the list containing x list containing x . rep rep x 1 x 2 rep rep S ∪ S : S x ∪ S y : rep [ S x ] y 1 y 2 y 3 rep [ S y ] 10/29/12 CMPS 2200 Intro. to Algorithms 10
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and t t th li t t i i d d • update the rep pointers for all elements in the list containing x list containing x . rep x 1 x 2 rep rep S ∪ S : S x ∪ S y : y 1 y 2 y 3 rep [ S x ∪ S y ] 10/29/12 CMPS 2200 Intro. to Algorithms 11
Trick 1 : Smaller into larger (weighted union heuristic) (weighted-union heuristic) To save work, concatenate the smaller list onto the end of the larger list. Cost = Θ (length of smaller list). Θ (l d f th l li t C t th f ll li t) Augment list to store its weight (# elements). • Let n denote the overall number of elements (equivalently, the number of M AKE -S ET operations). • Let m denote the total number of operations. L d h l b f i • Let f denote the number of F IND -S ET operations. Theorem: Cost of all U NION ’s is O( n log n ). Corollary: Total cost is O( m + n log n ). y ( g ) 10/29/12 CMPS 2200 Intro. to Algorithms 12
Analysis of Trick 1 (weighted union heuristic) (weighted-union heuristic) Theorem: Total cost of U NION ’s is O( n log n ). Proof. • Monitor an element x and set S x containing it. • After initial MAKE-SET( x ), weight [ S x ] = 1. ( ) g [ x ] • Each time S x is united with S y : • if weight [ S y ] ≥ weight [ S x ]: – pay 1 to update rep [ x ], and 1 t d t [ ] d – weight [ S x ] at least doubles (increases by weight [ S y ]). • if weight [ S ] < weight [ S ]: if weight [ S y ] weight [ S x ]: – pay nothing, and – weight [ S x ] only increases. Thus pay ≤ log n for x . 10/29/12 CMPS 2200 Intro. to Algorithms 13
Disjoint set forest: Representing sets as trees Representing sets as trees Store each set S i = { x 1 , x 2 , …, x k } as an unordered, potentially unbalanced not necessarily binary tree potentially unbalanced, not necessarily binary tree, storing only parent pointers. rep [ S i ] is the tree root. • M AKE -S ET ( x ) initializes x S ( ) i i i li S i = { x 1 , x 2 , x 3 , x 4 , x 5 , x 6 } – Θ (1) as a lone node. rep [ S ] rep [ S i ] x x 1 • F IND -S ET ( x ) walks up the • F IND S ET ( x ) walks up the tree containing x until it – Θ ( depth [ x ]) Θ ( depth [ x ]) x 4 x 4 x 3 x 3 reaches the root. reaches the root • U NION ( x , y ) calls F IND -S ET twice and concatenates the trees x 2 x 5 x 6 2 5 6 – Θ ( depth [ x ]) containing x and y … 10/29/12 CMPS 2200 Intro. to Algorithms 14
Trick 1 adapted to trees p • U NION ( x , y ) can use a simple concatenation strategy: Make root F IND S ET ( y ) a child of root F IND S ET ( x ) Make root F IND -S ET ( y ) a child of root F IND -S ET ( x ). x 1 1 • Adapt Trick 1 to this context: • Adapt Trick 1 to this context: Union-by-weight: x 4 x 3 y 1 Merge tree with smaller Merge tree with smaller weight into tree with x 2 x 5 x 6 y 4 y 3 larger weight. g g • Variant of Trick 1 (see book): y 2 y 5 Union-by-rank: Union by rank: Example: U NION( x 4 , y 2 ) rank of a tree = its height 10/29/12 CMPS 2200 Intro. to Algorithms 15
Trick 1 adapted to trees (union-by-weight) (union-by-weight) • Height of tree is logarithmic in weight, because: • Induction on n • Height of a tree T is determined by the two subtrees T 1 , T 2 that T has been united from. • Inductively the heights of T 1 , T 2 are the logs of their y g 1 , g 2 weights. • If T 1 and T 2 have different heights: height( T ) = max(height( T 1 ) height( T 2 )) height( T ) max(height( T 1 ), height( T 2 )) = max(log weight( T 1 ), log weight( T 2 )) < log weight( T ) • If T If T 1 and T 2 have the same heights: d T h th h i ht (Assume 2 ≤ weight( T 1 )<weight( T 2 ) ) height( T ) = height( T 1 ) + 1 = log (2*weight( T 1 )) ≤ log weight( T ) • Thus the total cost of any m operations is O( m log n ). 10/29/12 CMPS 2200 Intro. to Algorithms 16
Trick 2 : Path compression p When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative th t th t k th t ti for all the nodes on path p . x 1 1 Path compression makes x 4 x 3 all of those nodes direct y 1 children of the root. x 2 x 5 x 6 y 4 y 3 Cost of F IND -S ET ( x ) ( ) is still Θ ( depth [ x ]). y 2 y 5 F IND -S ET ( y 2 ) 10/29/12 CMPS 2200 Intro. to Algorithms 17
Recommend
More recommend