Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan, Willia Hoskins, Bing Feng and Jijun Tang
Background u Cancer can be modeled by molecular evolutionary processes (specifically deletion, insertion, etc.) u A mutational phylogenetic tree can be built with nodes as clones and subclones and directed edges as mutation processes. Image from Davis A. and Navin N., 2016
In this paper… u Genetic marker : Copy number of an array of genes detected by Fluorescent In Situ Hybridization (FISH) at single-cell level u Caused by insertion/deletion of genes u Caused by chromosomal aberrations u Data structure : u A clone is represented as a tuple of copy numbers u A patient is represented as a matrix of copy numbers -> main (and only) input to the phylogenetic problem
Problem formulation u Distance-based Minimum Tree u NP-hard
Minimum Spanning Tree u Input : A set of vertices 𝑊 and a 𝑊 × 𝑊 distance matrix OR a metric system. u Output : A 1-connected tree T = (𝑊, 𝐹) with minimum weight (sum of distances of vertices connected by an edge) u Prim’s/Kruskal’s greedy algorithm in polynomial time
Steiner Minimum Tree u Steiner nodes: unobserved nodes (absent in the dataset) u Input : A set of vertices 𝑊 and a metric system. u Output : A 1-connected tree T = 𝑊 ) , 𝐹 where 𝑊 ) ⊇ 𝑊 with minimum weight (sum of distances of vertices connected by an edge) u NP-hard u For the case of 𝑊 = 3 , this reduces to the Median problem. u Sankoff’s algorithm in linear time
Steiner Minimum Tree – Sankoff’s algorithm Image from Zhou J. et al., 2016
Rectilinear Steiner Minimum Tree (RSMT) u Rectilinear (Manhattan) metric: sum of absolute difference between corresponding positions from 2 tuples u Input : A set of vertices 𝑊 . u Output : A 1-connected tree T = 𝑊 ) , 𝐹 where 𝑊 ) ⊇ 𝑊 with minimum weight (sum of distances of vertices connected by an edge) under the rectilinear metric.
RSMT Exact Algorithm u Hanan’s Theorem for 2-D problem: There exists a RSMT containing only Steiner points from the Hanan’s grid u Solution space is bounded u Generalized for n-D u Exact (and naïve) algorithm would enumerate all possible sets of Steiner nodes, compute Minimum Spanning Tree on the new tree and compute the weight
RSMT Heuristics u Inspiration from the Median problem u Sankoff’s algorithm in linear time u Inspiration from Maximum parsimony problem u Maximum parsimony tree: heuristics of MP borrowed from TNT package
RSMT Heuristics from Minimum Spanning Tree – Sankoff’s Algorithm revisited Image from Zhou J. et al., 2016
RSMT Heuristics from Minimum Spanning Tree – iterative Sankoff’s Algorithm Image from Zhou J. et al., 2016
RSMT Heuristics from Minimum Spanning Tree (MST) u Minimizing number of Steiner nodes added by carefully selecting which nodes to add first. u Steiner count for an observed node A: the number of triplets containing A that require a Steiner node to optimize tree weight u Inference score for Steiner nodes : sum of Steiner counts in the triplet defining it
Image from Zhou J. et al., 2016
RSMT heuristics from Maximum Parsimony (MP) u MP heuristics (TNT package) to derive a tree whose leaves contains the dataset u Dynamic programming to assign states to internal nodes u Contract trivial edges (edge with weight 0 under rectilinear metric)
RSMT heuristics from Maximum Parsimony (MP) Image from Zhou J. et al., 2016
RSMT heuristics from Maximum Parsimony (MP) Image from Zhou J. et al., 2016
Results for Breast Cancer real data Image from Zhou J. et al., 2016
RSMT Results for simulated data Image from Zhou J. et al., 2016
Thank you QA u
Duplication Steiner Minimum Tree (DSMT) (Chowdhury et. al) u Input : A set of vertices 𝑊 . u Output : A 1-connected tree T = 𝑊 ) , 𝐹 where 𝑊 ) ⊇ 𝑊 with minimum weight (sum of distances of vertices connected by an edge) under a generalized metric to incorporate large scale duplication events.
DSMT Results for simulated data Image from Zhou J. et al., 2016
Recommend
More recommend