approximating the best fit tree under l p norms
play

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath - PowerPoint PPT Presentation

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn 0 = h L x,z ( x ) x ( dz ) 1 t + h L x,y x ( s ) ( x ) ds t L x,z ( x ) E y t 0 The


  1. Approximating the Best–Fit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn

  2. � 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Tree Metric • Goal: Minimize the L p cost-of-fit L p 1 /p   | D [ i, j ] − T [ i, j ] | p � L p ( D, T ) =  i,j

  3. � 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Ultrametric • Goal: Minimize the L p cost-of-fit L p 1 /p   | D [ i, j ] − T [ i, j ] | p � L p ( D, T ) =  i,j

  4. � 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Ultrametric • Goal: Minimize the L p cost-of-fit L rel � D [ i, j ] T [ i, j ] , T [ i, j ] � � L rel ( D, T ) = max D [ i, j ] i,j

  5. Tree Metric & Ultrametrics • Tree Metric: Distances between the leaves of a weighted tree. ∀ w, x, y, z ∈ [ n ] T [ w, x ] + T [ y, z ] ≤ max { T [ w, y ] + T [ x, z ] , T [ w, z ] + T [ x, y ] } • Ultrametric: Distance between the leaves of a rooted weighted tree in which all leaves are equidistance from root. ∀ x, y, z ∈ [ n ] T [ x, y ] ≤ max { T [ x, z ] , T [ z, y ] }

  6. Tree Metric & Ultrametrics • Tree Metric: Distances between the leaves of a weighted tree. ∀ w, x, y, z ∈ [ n ] T [ w, x ] + T [ y, z ] ≤ max { T [ w, y ] + T [ x, z ] , T [ w, z ] + T [ x, y ] } • Ultrametric: Distance between the leaves of a rooted weighted tree in which all leaves are equidistance from root. ∀ x, y, z ∈ [ n ] T [ x, y ] ≤ max { T [ x, z ] , T [ z, y ] } 1 3 3 4 2 3 1 3 3 2 2 1 2 2 1

  7. Biological Motivation • View ultrametric as an evolutionary tree • D [ i,j ] is estimate of time since species i and j diverged • Goal: Reconcile contradictory estimates

  8. Biological Motivation • View ultrametric as an evolutionary tree • D [ i,j ] is estimate of time since species i and j diverged • Goal: Reconcile contradictory estimates Shell Fish Fish Spider Wasp Bee Orangutan Chimp Theorist Computational Geometer

  9. Previous Work

  10. Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞

  11. Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞

  12. Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞ • Ma, Wang & Zhang ’99: n 1/p approximation of best-fit non-contracting ultrametric under L p

  13. Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞ • Ma, Wang & Zhang ’99: n 1/p approximation of best-fit non-contracting ultrametric under L p • Dhamdhere ’04: O(log 1/p n ) approximation of best-fit line metric under L p

  14. Our Results • Algorithm #1: L p : O( k log n ) 1/p approximation to best-fit tree where k is the number of distinct distances in D L rel : O(log 2 n ) approximation to best-fit ultrametric • Algorithm #2: L p : n 1/p approximation to best-fit tree

  15. Algorithm #1

  16. Restricting Splitting Distances

  17. Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k

  18. Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma:

  19. Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k }

  20. Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } b) There exists an ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } whose cost-of-fit is at most twice optimal (under L p ).

  21. Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } b) There exists an ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } whose cost-of-fit is at most twice optimal (under L p ). c)There exists an ultrametric with O(log n) distances whose cost-of-fit is at most twice optimal (under L rel ). [Assuming d k /d 1 is polynomial in n .]

  22. d 4 d 3 d 2 d 1

  23. d 4 d 3 d 2 d 1

  24. d 4 d 3 d 2 d 1 “Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v

  25. d 4 d 3 d 2 d 1 “Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v

  26. d 4 d 3 d 2 d 1 “Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v

  27. Algorithm Outline • Construct top partition G → G 1 , G 2 , G 3 , ... Set length of inter-cluster edges to d k All other lengths will be set to ≤ d k-1 • Construct trees for G 1 , G 2 , G 3 , ...

  28. G 1 G 2 G 3

  29. T [ i,j ] =d k G 1 G 2 G 3

  30. T [ i,j ] ≤ d k-1 G 1 G 2 G 3

  31. G 1 G 2 G 3

  32. G 1 G 2 G 3

  33. Correlation Clustering • Input: Weighted (positive and negative) graph • Output: A partitioning of nodes • Goal: Minimize, � � ( | w e | if e is split) + ( | w e | if e is not split) e : w e > 0 e : w e < 0 • O(log n ) approximation [Charikar, Guruswami and Wirth ’03]

  34. Correlation Clustering • Input: Weighted (positive and negative) graph • Output: A partitioning of nodes • Goal: Minimize, � � ( | w e | if e is split) + ( | w e | if e is not split) e : w e > 0 e : w e < 0 +1 +1 +2 +3 -1 +2 -5 -5 -7 • O(log n ) approximation [Charikar, Guruswami and Wirth ’03]

  35. Correlation Clustering • Input: Weighted (positive and negative) graph • Output: A partitioning of nodes • Goal: Minimize, � � ( | w e | if e is split) + ( | w e | if e is not split) e : w e > 0 e : w e < 0 +1 +1 +2 +3 -1 +2 -5 -5 -7 • O(log n ) approximation [Charikar, Guruswami and Wirth ’03]

  36. Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20

  37. Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20 Possible Splitting Distances: 20, 18, 17, 14, 11

  38. Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20 Possible Splitting Distances: 20, 18, 17, 14, 11 Top level clustering: Increase some lengths to 20 and decrease some length 20 edges to 18

  39. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 18 20 +2 -2 20 -2

  40. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 18 20 +2 -2 20 -2

  41. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 18 20 +2 -2 20 -2

  42. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2

  43. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering

  44. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering Recurse: 11 18 17 14

  45. Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering Recurse: 11 14 17 14

  46. Analysis (Outline)

  47. Analysis (Outline) • Let OPT be cost of fit of best-fit tree (under L 1 )

Recommend


More recommend