CMPS 6610 – Fall 2018 Union-Find Data Structures Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk 1 CMPS 6610 Algorithms
Disjoint-set data structure (Union-Find) Problem: • Maintain a dynamic collection of pairwise-disjoint sets S = { S 1 , S 2 , …, S r }. • Each set S i has one element distinguished as the representative element, rep [ S i ]. • Must support 3 operations: • M AKE -S ET ( x ): adds new set { x } to S with rep [{ x }] = x (for any x S i for all i ) • U NION ( x , y ): replaces sets S x , S y with S x S y in S (for any x , y in distinct sets S x , S y ) • F IND -S ET ( x ): returns representative rep [ S x ] of set S x containing element x 2 CMPS 6610 Algorithms
Union-Find Example The representative is S = {} underlined S = {{2}} M AKE -S ET (2) S = {{2}, {3}} M AKE -S ET (3) S = {{2}, {3}, {4}} M AKE -S ET (4) F IND -S ET (4) = 4 S = {{2, 4}, {3}} U NION (2, 4) F IND -S ET (4) = 2 S = {{2, 4}, {3}, {5}} M AKE -S ET (5) S = {{2, 4, 5}, {3}} U NION (4, 5) 3 CMPS 6610 Algorithms
Plan of attack •We will build a simple disjoint-set data structure that, in an amortized sense , performs significantly better than (log n ) per op., even better than (log log n ), (log log log n ), ..., but not quite (1). •To reach this goal, we will introduce two key tricks . Each trick converts a trivial ( n ) solution into a simple (log n ) amortized solution. Together, the two tricks yield a much better solution. • First trick arises in an augmented linked list. Second trick arises in a tree structure. 4 CMPS 6610 Algorithms
Augmented linked-list solution Store S i = { x 1 , x 2 , …, x k } as unordered doubly linked list. Augmentation: Each element x j also stores pointer rep [ x j ] to rep [ S i ] (which is the front of the list, x 1 ). rep Assume pointer to x … S i : is given. x 1 x 2 x k rep [ S i ] – (1) • F IND -S ET ( x ) returns rep [ x ]. • U NION ( x , y ) concatenates lists containing x and y and updates the rep pointers for – ( n ) all elements in the list containing y . 5 CMPS 6610 Algorithms
Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . rep S x : x 1 x 2 rep rep [ S x ] S y : y 1 y 2 y 3 rep [ S y ] 6 CMPS 6610 Algorithms
Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . S x S y : rep x 1 x 2 rep rep [ S x ] y 1 y 2 y 3 rep [ S y ] 7 CMPS 6610 Algorithms
Example of augmented linked-list solution Each element x j stores pointer rep [ x j ] to rep [ S i ]. U NION ( x , y ) • concatenates the lists containing x and y , and • updates the rep pointers for all elements in the list containing y . rep S x S y : x 1 x 2 rep [ S x S y ] y 1 y 2 y 3 8 CMPS 6610 Algorithms
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep S x : x 1 x 2 rep rep [ S x ] S y : y 1 y 2 y 3 rep [ S y ] 9 CMPS 6610 Algorithms
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep x 1 x 2 rep S x S y : rep [ S x ] y 1 y 2 y 3 rep [ S y ] 10 CMPS 6610 Algorithms
Alternative concatenation U NION ( x , y ) could instead • concatenate the lists containing y and x , and • update the rep pointers for all elements in the list containing x . rep x 1 x 2 rep S x S y : y 1 y 2 y 3 rep [ S x S y ] 11 CMPS 6610 Algorithms
Trick 1 : Smaller into larger (weighted-union heuristic) To save work, concatenate the smaller list onto the end of the larger list. Cost = (length of smaller list). Augment list to store its weight (# elements). • Let n denote the overall number of elements (equivalently, the number of M AKE -S ET operations). • Let m denote the total number of operations. • Let f denote the number of F IND -S ET operations. Theorem: Cost of all U NION ’s is O( n log n ). Corollary: Total cost is O( m + n log n ). 12 CMPS 6610 Algorithms
Analysis of Trick 1 (weighted-union heuristic) Theorem: Total cost of U NION ’s is O( n log n ). Proof. • Monitor an element x and set S x containing it. • After initial MAKE-SET( x ), weight [ S x ] = 1. • Each time S x is united with S y : • if weight [ S y ] weight [ S x ]: – pay 1 to update rep [ x ], and – weight [ S x ] at least doubles (increases by weight [ S y ]). • if weight [ S y ] < weight [ S x ]: – pay nothing, and – weight [ S x ] only increases. Thus pay log n for x . 13 CMPS 6610 Algorithms
Disjoint set forest: Representing sets as trees Store each set S i = { x 1 , x 2 , …, x k } as an unordered, potentially unbalanced, not necessarily binary tree, storing only parent pointers. rep [ S i ] is the tree root. • M AKE -S ET ( x ) initializes x S i = { x 1 , x 2 , x 3 , x 4 , x 5 , x 6 } – (1) as a lone node. rep [ S i ] • F IND -S ET ( x ) walks up the x 1 tree containing x until it – ( depth [ x ]) reaches the root. x 4 x 3 • U NION ( x , y ) calls F IND -S ET twice and concatenates the trees x 2 x 5 x 6 – ( depth [ x ]) containing x and y … 14 CMPS 6610 Algorithms
Trick 1 adapted to trees • U NION ( x , y ) can use a simple concatenation strategy: Make root F IND -S ET ( y ) a child of root F IND -S ET ( x ). x 1 • Adapt Trick 1 to this context: Union-by-weight: x 4 x 3 y 1 Merge tree with smaller weight into tree with x 2 x 5 x 6 y 4 y 3 larger weight. • Variant of Trick 1 (see book): y 2 y 5 Union-by-rank: Example: U NION( x 4 , y 2 ) rank of a tree = its height 15 CMPS 6610 Algorithms
Trick 1 adapted to trees (union-by-weight) • Height of tree is logarithmic in weight, because: • Induction on n • Height of a tree T is determined by the two subtrees T 1 , T 2 that T has been united from. • Inductively the heights of T 1 , T 2 at most the logs of their weights. • If T 1 and T 2 have different heights: height( T ) = max(height( T 1 ), height( T 2 )) max(log weight( T 1 ), log weight( T 2 )) < log weight( T ) • If T 1 and T 2 have the same heights: (Assume weight( T 1 ) weight( T 2 ) ) height( T ) = height( T 1 ) + 1 log (2*weight( T 1 )) log weight( T ) • Thus the total cost of any m operations is O( m log n ). 16 CMPS 6610 Algorithms
Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 children of the root. x 2 x 5 x 6 y 4 y 3 Cost of F IND -S ET ( x ) is still ( depth [ x ]). y 2 y 5 F IND -S ET ( y 2 ) 17 CMPS 6610 Algorithms
Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 children of the root. x 2 x 5 x 6 y 4 y 3 Cost of F IND -S ET ( x ) is still ( depth [ x ]). y 2 y 5 F IND -S ET ( y 2 ) 18 CMPS 6610 Algorithms
Trick 2 : Path compression When we execute a F IND -S ET operation and walk up a path p to the root, we know the representative for all the nodes on path p . x 1 Path compression makes all of those nodes direct x 4 x 3 y 1 y 2 y 3 children of the root. x 2 x 5 x 6 y 5 y 4 Cost of F IND -S ET ( x ) is still ( depth [ x ]). F IND -S ET ( y 2 ) 19 CMPS 6610 Algorithms
Trick 2 : Path compression • Note that U NION ( x,y ) first calls F IND -S ET ( x ) and F IND -S ET ( y ). Therefore path compression also affects UNION operations. 20 CMPS 6610 Algorithms
Analysis of Trick 2 alone Theorem: Total cost of F IND -S ET ’s is O( m log n ). Proof: By amortization. Omitted. 21 CMPS 6610 Algorithms
Analysis of Tricks 1 + 2 for disjoint-set forests Theorem: In general, total cost is O( m ( n )). Proof: Long, tricky proof by amortization. Omitted. 22 CMPS 6610 Algorithms
Ackermann’s function A, and it’s “inverse” 1 if 0 , j k ( ) Define A j ( 1 ) k j ( ) if 1 . – iterate j +1 times A j k 1 k A 0 (1) = 2 A 0 ( j ) = j + 1 A 1 (1) = 3 A 1 ( j ) ~ 2 j A 2 ( j ) ~ 2 j 2 j > 2 j A 2 (1) = 7 A 3 (1) = 2047 2 j . . . 2 2047 2 2 j . . . 2048 times 2 2 2 A 3 ( j ) > 2 A 4 (1) > A 4 ( j ) is a lot bigger. Define ( n ) = min { k : A k (1) n } 4 for practical n . 23 CMPS 6610 Algorithms
Recommend
More recommend