Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath - PowerPoint PPT Presentation

Approximating the Best–Fit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn

� 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Tree Metric • Goal: Minimize the L p cost-of-fit L p 1 /p   | D [ i, j ] − T [ i, j ] | p � L p ( D, T ) =  i,j

� 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Ultrametric • Goal: Minimize the L p cost-of-fit L p 1 /p   | D [ i, j ] − T [ i, j ] | p � L p ( D, T ) =  i,j

� 0 � = h L x,z ϕ ( x ) ρ x ( dz ) � 1 � � t ε � + h L x,y x ( s ) ϕ ( x ) ds − t ε L x,z ϕ ( x ) E y t ε 0 The Problem(s) � � t ε � t ε + 1 E y L x,y x ( s ) ϕ ( x ) ds − E x,y L x t ε 0 0 = h � L x ϕ ( x ) + hθ ε ( x, y ) • Input: Distance Matrix D [ i,j ] on n items • Output: Tree Metri c T [ i,j ] Ultrametric • Goal: Minimize the L p cost-of-fit L rel � D [ i, j ] T [ i, j ] , T [ i, j ] � � L rel ( D, T ) = max D [ i, j ] i,j

Tree Metric & Ultrametrics • Tree Metric: Distances between the leaves of a weighted tree. ∀ w, x, y, z ∈ [ n ] T [ w, x ] + T [ y, z ] ≤ max { T [ w, y ] + T [ x, z ] , T [ w, z ] + T [ x, y ] } • Ultrametric: Distance between the leaves of a rooted weighted tree in which all leaves are equidistance from root. ∀ x, y, z ∈ [ n ] T [ x, y ] ≤ max { T [ x, z ] , T [ z, y ] }

Tree Metric & Ultrametrics • Tree Metric: Distances between the leaves of a weighted tree. ∀ w, x, y, z ∈ [ n ] T [ w, x ] + T [ y, z ] ≤ max { T [ w, y ] + T [ x, z ] , T [ w, z ] + T [ x, y ] } • Ultrametric: Distance between the leaves of a rooted weighted tree in which all leaves are equidistance from root. ∀ x, y, z ∈ [ n ] T [ x, y ] ≤ max { T [ x, z ] , T [ z, y ] } 1 3 3 4 2 3 1 3 3 2 2 1 2 2 1

Biological Motivation • View ultrametric as an evolutionary tree • D [ i,j ] is estimate of time since species i and j diverged • Goal: Reconcile contradictory estimates

Biological Motivation • View ultrametric as an evolutionary tree • D [ i,j ] is estimate of time since species i and j diverged • Goal: Reconcile contradictory estimates Shell Fish Fish Spider Wasp Bee Orangutan Chimp Theorist Computational Geometer

Previous Work

Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞

Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞

Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞ • Ma, Wang & Zhang ’99: n 1/p approximation of best-fit non-contracting ultrametric under L p

Previous Work • Farach, Kannan & Warnow ’95: Exact construction of best-fit ultrametric under L ∞ • Agarwala, Bafna, Farach, Paterson & Thorup ’99: 3 approximation of best-fit tree under L ∞ • Ma, Wang & Zhang ’99: n 1/p approximation of best-fit non-contracting ultrametric under L p • Dhamdhere ’04: O(log 1/p n ) approximation of best-fit line metric under L p

Our Results • Algorithm #1: L p : O( k log n ) 1/p approximation to best-fit tree where k is the number of distinct distances in D L rel : O(log 2 n ) approximation to best-fit ultrametric • Algorithm #2: L p : n 1/p approximation to best-fit tree

Algorithm #1

Restricting Splitting Distances

Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k

Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma:

Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k }

Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } b) There exists an ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } whose cost-of-fit is at most twice optimal (under L p ).

Restricting Splitting Distances • Original distances are d 1 <d 2 < ... < d k • Lemma: a) There exists a best-fit (under L 1 ) ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } b) There exists an ultrametric whose distances are a subset of { d 1 ,d 2 ,... , d k } whose cost-of-fit is at most twice optimal (under L p ). c)There exists an ultrametric with O(log n) distances whose cost-of-fit is at most twice optimal (under L rel ). [Assuming d k /d 1 is polynomial in n .]

d 4 d 3 d 2 d 1

d 4 d 3 d 2 d 1 “Splitting Distance” of internal node v = Distance between leaves of subtree rooted a v

Algorithm Outline • Construct top partition G → G 1 , G 2 , G 3 , ... Set length of inter-cluster edges to d k All other lengths will be set to ≤ d k-1 • Construct trees for G 1 , G 2 , G 3 , ...

G 1 G 2 G 3

T [ i,j ] =d k G 1 G 2 G 3

T [ i,j ] ≤ d k-1 G 1 G 2 G 3

G 1 G 2 G 3

Correlation Clustering • Input: Weighted (positive and negative) graph • Output: A partitioning of nodes • Goal: Minimize, � � ( | w e | if e is split) + ( | w e | if e is not split) e : w e > 0 e : w e < 0 • O(log n ) approximation [Charikar, Guruswami and Wirth ’03]

Correlation Clustering • Input: Weighted (positive and negative) graph • Output: A partitioning of nodes • Goal: Minimize, � � ( | w e | if e is split) + ( | w e | if e is not split) e : w e > 0 e : w e < 0 +1 +1 +2 +3 -1 +2 -5 -5 -7 • O(log n ) approximation [Charikar, Guruswami and Wirth ’03]

Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20

Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20 Possible Splitting Distances: 20, 18, 17, 14, 11

Using Correlation Clustering Best-Fit Ultrametric Instance: 20 11 20 17 14 20 18 18 20 20 Possible Splitting Distances: 20, 18, 17, 14, 11 Top level clustering: Increase some lengths to 20 and decrease some length 20 edges to 18

Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 18 20 +2 -2 20 -2

Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2

Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering

Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering Recurse: 11 18 17 14

Using Correlation Clustering Best-Fit Ultrametric Instance: Correlation Clustering Instance: 20 11 -2 +9 20 -2 17 14 +3 +6 20 18 -2 +2 20 20 +2 -2 20 -2 Cost of length changes = Cost of disagreements during clustering Recurse: 11 14 17 14

Analysis (Outline)

Analysis (Outline) • Let OPT be cost of fit of best-fit tree (under L 1 )

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath - PowerPoint PPT Presentation

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn 0 = h L x,z ( x ) x ( dz ) 1 t + h L x,y x ( s ) ( x ) ds t L x,z ( x ) E y t 0 The

Chapter 7 Norms and Distance Measures Chapter 7 Vector Norms Norms are functions which measure

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Enhancing compliance Why do people comply? Adriaan Denkers FIOD-ECD Why do people obey the law?

UCF FINANCIALS THE N EXT G EN Fit-Gap Kick Off April 17, 2018 AGENDA How are fit-gap sessions

RINASim Marcel Marek FIT-BUT imarek@fit.vutbr.cz 1 OMNeT Community Summit - FIT BUT,

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Section 6.6 Least Squares Problems Data Modeling: Best fit line What does it minimize? Best fit

For Monday Read chapter 10, section 4 Chapter 10, exercise 10 Research Paper Any

FIT FOR DUTY PHYSICAL AND PSYCHOLOGICAL AWARENESS WHAT TO LOOK FOR WHAT TO LISTEN FOR FIT

B-trees anhtt-fit@mail.hut.edu.vn B tree A B-Tree of order m (the maximum number of children

6/12/2014 Social norms theoretical background Basic elements of social norms campaign

Lecture 4 : Norms, Culture and Identity Zaki Wahhaj Why Norms, Culture & Identity? All

Norms and and Electronic Electronic Institutions Institutions Norms for Behaviour for

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Gurion Ang gurion.ang gurionang cabbage angel moth pualele Thank You to The Crawford Fund

How to Rank Your Website on Page #1 of Google SEARCH ENGINE OPTIMISATION (SEO) Topics Covered

A Machine Learning Approach to Recipe Flow Construction Shinsuke Mori, Tetsuro Sasada, Yoko

Tender Briefing SFA land sales - Vegetable Farming Tranche 6 1 FOOD INFRASTRUCTURE DEVELOPMENT

2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 6: Schematization Schematic maps

Syrah Aconcagua Brix 25,4 PH 3,45 Block 4 Harvest Date March 15 Yield (Tons/Acre) 4,2

ECON228: Study Tour to South America - The Economics of the Wine Industry Brief overview

loca tions Santa Cruz Concepcin Aconcagua Valley 846 Ha. Pacific Ocean The topography

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath - PowerPoint PPT Presentation

Approximating the BestFit Tree Under L p Norms Boulos Harb, Sampath Kannan and Andrew McGregor, UPenn 0 = h L x,z ( x ) x ( dz ) 1 t + h L x,y x ( s ) ( x ) ds t L x,z ( x ) E y t 0 The

Chapter 7 Norms and Distance Measures Chapter 7 Vector Norms Norms are functions which measure

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Enhancing compliance Why do people comply? Adriaan Denkers FIOD-ECD Why do people obey the law?

UCF FINANCIALS THE N EXT G EN Fit-Gap Kick Off April 17, 2018 AGENDA How are fit-gap sessions

RINASim Marcel Marek FIT-BUT imarek@fit.vutbr.cz 1 OMNeT Community Summit - FIT BUT,

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Section 6.6 Least Squares Problems Data Modeling: Best fit line What does it minimize? Best fit

For Monday Read chapter 10, section 4 Chapter 10, exercise 10 Research Paper Any

FIT FOR DUTY PHYSICAL AND PSYCHOLOGICAL AWARENESS WHAT TO LOOK FOR WHAT TO LISTEN FOR FIT

B-trees anhtt-fit@mail.hut.edu.vn B tree A B-Tree of order m (the maximum number of children

6/12/2014 Social norms theoretical background Basic elements of social norms campaign

Lecture 4 : Norms, Culture and Identity Zaki Wahhaj Why Norms, Culture &amp; Identity? All

Norms and and Electronic Electronic Institutions Institutions Norms for Behaviour for

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Gurion Ang gurion.ang gurionang cabbage angel moth pualele Thank You to The Crawford Fund

How to Rank Your Website on Page #1 of Google SEARCH ENGINE OPTIMISATION (SEO) Topics Covered

A Machine Learning Approach to Recipe Flow Construction Shinsuke Mori, Tetsuro Sasada, Yoko

Tender Briefing SFA land sales - Vegetable Farming Tranche 6 1 FOOD INFRASTRUCTURE DEVELOPMENT

2IMA20 Algorithms for Geographic Data Spring 2016 Lecture 6: Schematization Schematic maps

Syrah Aconcagua Brix 25,4 PH 3,45 Block 4 Harvest Date March 15 Yield (Tons/Acre) 4,2

ECON228: Study Tour to South America - The Economics of the Wine Industry Brief overview

loca tions Santa Cruz Concepcin Aconcagua Valley 846 Ha. Pacific Ocean The topography

Lecture 4 : Norms, Culture and Identity Zaki Wahhaj Why Norms, Culture & Identity? All