union find data structures
play

Union-Find Data Structures Carola Wenk Slides courtesy of Charles - PowerPoint PPT Presentation

CMPS 6610 Fall 2018 Union-Find Data Structures Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 1 CMPS 6610 Algorithms Disjoint-set data structure (Union-Find) Problem: Maintain a dynamic collection


  1. CMPS 6610 – Fall 2018 Union-Find Data Structures Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 1 CMPS 6610 Algorithms

  2. Disjoint-set data structure (Union-Find) Problem: • Maintain a dynamic collection of pairwise-disjoint sets S = { S 1 , S 2 , …, S r }. • Each set S i has one element distinguished as the representative element, rep [ S i ]. • Must support 3 operations: • M AKE -S ET ( x ): adds new set { x } to S with rep [{ x }] = x (for any x  S i for all i ) • U NION ( x , y ): replaces sets S x , S y with S x  S y in S (for any x , y in distinct sets S x , S y ) • F IND -S ET ( x ): returns representative rep [ S x ] of set S x containing element x 2 CMPS 6610 Algorithms

  3. Union-Find Example The representative is S = {} underlined S = {{2}} M AKE -S ET (2) S = {{2}, {3}} M AKE -S ET (3) S = {{2}, {3}, {4}} M AKE -S ET (4) F IND -S ET (4) = 4 S = {{2, 4}, {3}} U NION (2, 4) F IND -S ET (4) = 2 S = {{2, 4}, {3}, {5}} M AKE -S ET (5) S = {{2, 4, 5}, {3}} U NION (4, 5) 3 CMPS 6610 Algorithms

  4. Plan of attack •We will build a simple disjoint-set data structure that, in an amortized sense , performs significantly better than  (log n ) per op., even better than  (log log n ),  (log log log n ), ..., but not quite  (1). •To reach this goal, we will introduce two key tricks . Each trick converts a trivial  ( n ) solution into a simple  (log n ) amortized solution. Together, the two tricks yield a much better solution. • First trick arises in an augmented linked list. Second trick arises in a tree structure. 4 CMPS 6610 Algorithms

  5. Augmented linked-list solution Store S i = { x 1 , x 2 , …, x k } as unordered doubly linked list. Augmentation: Each element x j also stores pointer rep [ x j ] to rep [ S i ] (which is the front of the list, x 1 ). rep Assume pointer to x … S i : is given. x 1 x 2 x k rep [ S i ] –  (1) • F IND -S ET ( x ) returns rep [ x ]. • U NION ( x , y ) concatenates lists containing x and y and updates the rep pointers for –  ( n ) all elements in the list containing y . 5 CMPS 6610 Algorithms

  6. Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . rep S x : x 1 x 2 rep rep [ S x ] S y : y 1 y 2 y 3 rep [ S y ] 6 CMPS 6610 Algorithms

  7. Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . S x  S y : rep x 1 x 2 rep rep [ S x ] y 1 y 2 y 3 rep [ S y ] 7 CMPS 6610 Algorithms

  8. Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . rep S x  S y : x 1 x 2 rep [ S x  S y ] y 1 y 2 y 3 8 CMPS 6610 Algorithms

  9. Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep S x : x 1 x 2 rep rep [ S x ] S y : y 1 y 2 y 3 rep [ S y ] 9 CMPS 6610 Algorithms

  10. Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep x 1 x 2 rep S x  S y : rep [ S x ] y 1 y 2 y 3 rep [ S y ] 10 CMPS 6610 Algorithms

  11. Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep x 1 x 2 rep S x  S y : y 1 y 2 y 3 rep [ S x  S y ] 11 CMPS 6610 Algorithms

  12. Trick 1 : Smaller into larger (weighted-union heuristic) To save work, concatenate the smaller list onto the end of the larger list. Cost =  (length of smaller list). Augment list to store its weight (# elements). • Let n denote the overall number of elements (equivalently, the number of M AKE -S ET operations). • Let m denote the total number of operations. • Let f denote the number of F IND -S ET operations. Theorem: Cost of all U NION ’s is O( n log n ). Corollary: Total cost is O( m + n log n ). 12 CMPS 6610 Algorithms

  13. Analysis of Trick 1 (weighted-union heuristic) Theorem: Total cost of U NION ’s is O( n log n ). Proof. • Monitor an element x and set S x containing it. • After initial MAKE-SET( x ), weight [ S x ] = 1. • Each time S x is united with S y : • if weight [ S y ]  weight [ S x ]: – pay 1 to update rep [ x ], and – weight [ S x ] at least doubles (increases by weight [ S y ]). • if weight [ S y ] < weight [ S x ]: – pay nothing, and – weight [ S x ] only increases. Thus pay  log n for x . 13 CMPS 6610 Algorithms

  14. Disjoint set forest: Representing sets as trees Store each set S i = { x 1 , x 2 , …, x k } as an unordered, potentially unbalanced, not necessarily binary tree, storing only parent pointers. rep [ S i ] is the tree root. • M AKE -S ET ( x ) initializes x S i = { x 1 , x 2 , x 3 , x 4 , x 5 , x 6 } –  (1) as a lone node. rep [ S i ] • F IND -S ET ( x ) walks up the x 1 tree containing x until it –  ( depth [ x ]) reaches the root. x 4 x 3 • U NION ( x , y ) calls F IND -S ET twice and concatenates the trees x 2 x 5 x 6 –  ( depth [ x ]) containing x and y … 14 CMPS 6610 Algorithms

  15. Trick 1 adapted to trees • U NION ( x , y ) can use a simple concatenation strategy: Make root F IND -S ET ( y ) a child of root F IND -S ET ( x ). x 1 • Adapt Trick 1 to this context: Union-by-weight: x 4 x 3 y 1 Merge tree with smaller weight into tree with x 2 x 5 x 6 y 4 y 3 larger weight. • Variant of Trick 1 (see book): y 2 y 5 Union-by-rank: Example: U NION( x 4 , y 2 ) rank of a tree = its height 15 CMPS 6610 Algorithms

  16. Trick 1 adapted to trees (union-by-weight) • Height of tree is logarithmic in weight, because: • Induction on n • Height of a tree T is determined by the two subtrees T 1 , T 2 that T has been united from. • Inductively the heights of T 1 , T 2 at most the logs of their weights. • If T 1 and T 2 have different heights: height( T ) = max(height( T 1 ), height( T 2 ))  max(log weight( T 1 ), log weight( T 2 )) < log weight( T ) • If T 1 and T 2 have the same heights: (Assume weight( T 1 )  weight( T 2 ) ) height( T ) = height( T 1 ) + 1  log (2*weight( T 1 ))  log weight( T ) • Thus the total cost of any m operations is O( m log n ). 16 CMPS 6610 Algorithms

  17. Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 children of the root. x 2 x 5 x 6 y 4 y 3 Cost of F IND -S ET ( x ) is still  ( depth [ x ]). y 2 y 5 F IND -S ET ( y 2 ) 17 CMPS 6610 Algorithms

  18. Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 children of the root. x 2 x 5 x 6 y 4 y 3 Cost of F IND -S ET ( x ) is still  ( depth [ x ]). y 2 y 5 F IND -S ET ( y 2 ) 18 CMPS 6610 Algorithms

  19. Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 y 2 y 3 children of the root. x 2 x 5 x 6 y 5 y 4 Cost of F IND -S ET ( x ) is still  ( depth [ x ]). F IND -S ET ( y 2 ) 19 CMPS 6610 Algorithms

  20. Trick 2 : Path compression • Note that U NION ( x,y ) first calls F IND -S ET ( x ) and F IND -S ET ( y ). Therefore path compression also affects UNION operations. 20 CMPS 6610 Algorithms

  21. Analysis of Trick 2 alone Theorem: Total cost of F IND -S ET ’s is O( m log n ). Proof: By amortization. Omitted. 21 CMPS 6610 Algorithms

  22. Analysis of Tricks 1 + 2 for disjoint-set forests Theorem: In general, total cost is O( m  ( n )). Proof: Long, tricky proof by amortization. Omitted. 22 CMPS 6610 Algorithms

  23. Ackermann’s function A, and it’s “inverse”     1 if 0 , j k  ( ) Define  A j  ( 1 ) k j  ( ) if 1 . – iterate j +1 times  A j k  1 k A 0 (1) = 2 A 0 ( j ) = j + 1 A 1 (1) = 3 A 1 ( j ) ~ 2 j A 2 ( j ) ~ 2 j 2 j > 2 j A 2 (1) = 7 A 3 (1) = 2047 2 j . . . 2 2047 2 2 j . . . 2048 times 2 2 2 A 3 ( j ) > 2 A 4 (1) > A 4 ( j ) is a lot bigger. Define  ( n ) = min { k : A k (1)  n }  4 for practical n . 23 CMPS 6610 Algorithms

Recommend


More recommend