CS 573: Algorithms, Fall 2014 Union-Find Part I Lecture 21 November 6, 2014 Union Find 1/45 2/45 Requirements from the data-structure Amortized Analysis 1. Maintain a collection of sets. 1. Use data-structure as a black-box inside algorithm. ... Union-Find in Kruskal algorithm for computing MST. 2. makeSet ( x ) - creates a set that contains the single element x . 2. Bounded worst case time per operation. 3. find (x) - returns the set that contains x . 3. Care: overall running time spend in data-structure. 4. union ( A , B ) - returns set = union of A and B . That is 4. amortized running-time of operation A ∪ B . = average time to perform an operation on ... merges the two sets A and B and return the merged data-structure. 5. Amortized time per operation = overall running time set. number of operations. 3/45 4/45
Reversed Trees Reversed Trees Representing sets in the Union-Find DS !esrever ni retteb si gnihtyreve esuaceB 1. Reversed Trees: 1.1 Initially: Every element is its own node. a 1.2 Node v : p ( v ) pointer to its parent. k 1.3 Set uniquely identified by root node/element. a g 2. makeSet : Create a singleton pointing to itself: b c f h 3. find ( x ): 3.1 Start from node containing x , j e traverse up tree, till arriving to i d a root. The Union-Find representation of the sets A = { a , b , c , d , e } b 3.2 find ( x ): c and B = { f , g , h , i , j , k } . The set A is uniquely identified by x → b → a d x a pointer to the root of A , which is the node containing a . 3.3 a : returned as set. 5/45 6/45 Union operation in reversed trees Pseudo-code of naive version... Just hang them on each other. makeSet (x) p ( x ) ← x union ( x , y ) union ( a , p ): Merge two sets. A ← find( x ) find (x) 1. Hanging the root of one tree, on the root of the other. B ← find( y ) if x = p ( x ) then p ( B ) ← A 2. A destructive operation, and the two original sets no return x longer exist. return find( p ( x )) 7/45 8/45
Example... Find is slow, hack it! The long chain 1. find might require Ω( n ) time. 2. Q : How improve performance? g a a c e g a a c e g a a c e g a a c e g a a c f f f f h b d h b d h b d h b d h b 3. Two “hacks”: (i) Union by rank : After: makeSet ( a ), makeSet ( b ), makeSet ( c ), Maintain in root of tree , a bound on its depth makeSet ( d ), makeSet ( e ), makeSet ( f ), makeSet ( g ), ( rank ). makeSet ( h ) Rule : Hang the smaller tree on the larger tree union ( g , h ) in union . union ( f , g ) union ( e , f ) (ii) Path compression : union ( d , e ) During find, make all pointers on path point to union ( c , d ) root. union ( b , c ) union ( a , b ) 9/45 10/45 Path compression in action... Pseudo-code of improved version... union ( x , y ) makeSet (x) A ← find( x ) a p ( x ) ← x B ← find( y ) rank ( x ) ← 0 b if rank ( A ) > rank ( B ) then c a p ( B ) ← A x y x z find (x) b c d else if x � = p ( x ) then p ( A ) ← B y p ( x ) ← find( p ( x )) d if rank ( A ) = rank ( B ) then return p ( x ) rank ( B ) ← rank ( B ) + 1 z (a) (b) (a) The tree before performing find ( z ), and (b) The reversed tree after performing find ( z ) that uses path compression. 11/45 12/45
Definition Definition v : Node UnionFind data-structure D Part II v is leader ⇐ ⇒ v root of a (reversed) tree in D . “When you’re not a leader, you’re little people.” Analyzing the Union-Find Data-Structure 13/45 14/45 Lemma Another Lemma Lemma Lemma Once node v stop being a leader, can never become leader Once a node stop being a leader then its rank is fixed. again. Proof. Proof. 1. rank of element changes only by union operation. 1. x stopped being leader because union operation hanged 2. union operation changes rank only for... x on y . the “new” leader of the new set. 2. From this point on... 3. if an element is no longer a leader, than its rank is fixed. 3. x might change only its parent pointer ( find ). 4. x parent pointer will never become equal to x again. 5. x never a leader again. 15/45 16/45
Ranks are strictly monotonically increasing Proof... Lemma 1. Claim: ∀ u → v in DS: rank ( u ) < rank ( v ) . Ranks are monotonically increasing in the reversed trees... 2. Proof by induction. Base: all singletons. Holds. ...along a path from node to root of the tree. 3. Assume claim holds at time t , before an operation. 4. If operation is union ( A , B ), and assume that we hanged root ( A ) on root ( B ) . Must be that rank ( root ( B )) is now larger than rank ( root ( A )) (verify!). Claim true after operation! 5. If operation find : traverse path π , then all the nodes of π are made to point to the last node v of π . By induction, rank ( v ) > rank of all other nodes of π . All the nodes that get compressed, the rank of their new parent, is larger than their own rank. 17/45 18/45 Trees grow exponentially in size with rank Having higher rank is rare Lemma Lemma ⇒ at least ≥ 2 k elements in its # nodes that get assigned rank k throughout execution of When node gets rank k = Union-Find DS is at most n / 2 k . subtree. Proof. Proof. 1. By induction. For k = 0 it is obvious. 1. Proof is by induction. 2. when v become of rank k . Charge to roots merged: u 2. For k = 0 : obvious since a singleton has a rank zero, and and v . a single element in the set. 3. Before union: u and v of rank k − 1 3. node u gets rank k only if the merged two roots u , v has 4. After merge: rank ( v ) = k and rank ( u ) = k − 1 . rank k − 1 . 4. By induction, u and v have ≥ 2 k − 1 nodes before merge. 5. u no longer leader. Its rank is now fixed. 5. merged tree has ≥ 2 k − 1 + 2 k − 1 = 2 k nodes. 6. u , v leave rank k − 1 = ⇒ v enters rank k . 7. By induction: at most n / 2 k − 1 nodes of rank k − 1 created. � n / 2 k − 1 � / 2 = n / 2 k . = ⇒ # nodes rank k : ≤ 19/45 20/45
log ∗ in detail Find takes logarithmic time 1. log ∗ ( n ) : number of times to take lg of number to get Lemma The time to perform a single find operation when we perform number smaller than two. 2. log ∗ 2 = 1 union by rank and path compression is O (log n ) time. 3. log ∗ 2 2 = 2 . Proof. 4. log ∗ 2 2 2 = 1 + log ∗ (2 2 ) = 2 + log ∗ 2 = 3 . 1. rank of leader v of reversed tree T , bounds depth of T . 5. log ∗ 2 2 22 = log ∗ (65536) = 4 . 2. By previous lemma: max rank ≤ lg n . 6. log ∗ 2 2 222 3. Depth of tree is O (log n ) . = log ∗ 2 65536 = 5 . 7. log ∗ is a monotone increasing function. 4. Time to perform find bounded by depth of tree. 8. β = 2 2 222 = 2 65536 : huge number For practical purposes, log ∗ returns value ≤ 5 . 21/45 22/45 Can do much better! The tower function... Theorem Definition Tower ( b ) = 2 Tower ( b − 1) and Tower (0) = 1 . For a sequence of m operations over n elements, the overall running time of the UnionFind data-structure is Tower ( i ) : a tower of 2 2 2 ··· 2 O (( n + m ) log ∗ n ) . of height i . Observe that log ∗ ( Tower ( i )) = i . 1. Intuitively: UnionFind data-structure takes constant Definition time per operation... For i ≥ 0 , let Block ( i ) = [ Tower ( i − 1) + 1 , Tower ( i )] ; (unless n is larger than β which is unlikely). that is 2. Not quite correct if n sufficiently large... � z , 2 z − 1 � Block ( i ) = for z = Tower ( i − 1) + 1 . Also Block (0) = [0 , 1] . As such, � � � � Block (0) = 0 , 1 , Block (1) = 2 , 2 , � � � � Block (2) = 3 , 4 , Block (3) = 5 , 16 , � � � 65537 , 2 65536 � Block (4) = 17 , 65536 , Block (5) = . . . 23/45 24/45
Running time of find... Blocks and jumping pointers 1. RT of find (x) proportional to length of the path from x 1. maximum rank of node v is O (log n ) . 2. # of blocks is O (log ∗ n ) , as to the root of its tree. O (log n ) ∈ Block ( c log ∗ n ) , ( c : constant, say 2 ). 2. ...start from x and we visit the sequence: x 1 = x , x 2 = p ( x 1 ) , x 3 = p ( x 2 ) , . . . , x i = p ( x i − 1 ) , 3. find ( x ): π path used. . . . , x m = p ( x m − 1 ) = root of tree. 4. partition π into each by rank. 3. rank ( x 1 ) < rank ( x 2 ) < rank ( x 3 ) < . . . < 5. Price of find length π . rank ( x m ) . 6. node x : ν = index B ( x ) index block containing 4. RT of find ( x ) is O ( m ) . rank ( x ) . Definition � � 7. rank ( x ) ∈ Block index B ( x ) . A node x is in the ith block if rank ( x ) ∈ Block ( i ) . 8. index B ( x ) : block of x 5. Looking for ways to pay for the find operation. 6. Since other two operations take constant time... 25/45 26/45 The path of find operation, and its pointers The pointers between blocks... 1. During a find operation... Block(10) 2. π : path traversed. Block(9) 3. Ranks of the nodes visited in π monotone increasing. Block(8) 4. Once leave block i th, never go back! Block(6 . . . 7) 5. charge visit to nodes in π next to element in a different Block(5) between jump block... 6. to total number of blocks ≤ O (log ∗ n ) . internal jump Block(1 . . . 4) Block(1) Block(0) 27/45 28/45
Recommend
More recommend