finding representatives in a heterogeneous network
play

Finding representatives in a heterogeneous network Laura Langohr - PowerPoint PPT Presentation

Outline Introduction K -medoids Experiments Future Work Conclusion Finding representatives in a heterogeneous network Laura Langohr Department of Computer Science University of Helsinki May 19, 2009 Laura Langohr Finding representatives


  1. Outline Introduction K -medoids Experiments Future Work Conclusion Finding representatives in a heterogeneous network Laura Langohr Department of Computer Science University of Helsinki May 19, 2009 Laura Langohr Finding representatives in a heterogeneous network

  2. Outline Introduction K -medoids Experiments Future Work Conclusion Introduction K -medoids Experiments Future Work Conclusion Laura Langohr Finding representatives in a heterogeneous network

  3. Outline Introduction K -medoids Experiments Future Work Conclusion Motivation • Finding representative vertices • Given a list of 100 vertices • But only resources to study 10 vertices • Cluster 100 vertices in 10 clusters • For each cluster suggest a vertex as representative Laura Langohr Finding representatives in a heterogeneous network

  4. Outline Introduction K -medoids Experiments Future Work Conclusion Example graph ���� ���� D � � 0 . 72 � 0 . 54 � � � � � � ���� ���� � � � � � � � � � 0 . 9 � B 0 . 62 � � � � � � 0 . 9 � 0 . 51 � � � � � � ���� ���� ���� ���� � � � � � � � � � � 0 . 64 � � � C F � � � 0 . 55 � � � � � � � � � 0 . 54 � 0 . 83 � � � � � � � � � � � � � � � � � 0 . 71 � ���� ���� ���� ���� � � � � � � � � � A E � ����������� � � � 0 . 71 � 0 . 63 0 . 78 � � � � � 0 . 55 � � � � Laura Langohr Finding representatives in a heterogeneous network

  5. Outline Introduction K -medoids Experiments Future Work Conclusion K -medoids • Clustering method • Objects are partitioned into k clusters • First, an initial partitioning is created • The partition is then iteratively improved • Cluster centers are objects → medoids Laura Langohr Finding representatives in a heterogeneous network

  6. Outline Introduction K -medoids Experiments Future Work Conclusion Algorithm 1. K objects are randomly chosen as medoids 2. Assign remaining objects to the medoid that is the nearest 3. Calculate new medoid for each cluster Laura Langohr Finding representatives in a heterogeneous network

  7. Outline Introduction K -medoids Experiments Future Work Conclusion K -means • K -medoids is similar to k -means • K -means uses mean value as cluster center Laura Langohr Finding representatives in a heterogeneous network

  8. Outline Introduction K -medoids Experiments Future Work Conclusion K -medoids vs k -means Laura Langohr Finding representatives in a heterogeneous network

  9. Outline Introduction K -medoids Experiments Future Work Conclusion K -medoids in a heterogeneous network • Select few representatives from a large set of vertices • Representatives should be independent of each other • Relations between two vertices in a graph → link • Including undiscovered relations • Undiscovered relations are manifested as path(s) Laura Langohr Finding representatives in a heterogeneous network

  10. Outline Introduction K -medoids Experiments Future Work Conclusion Measure for link strength • Probability of a path is the product of the probabilities of the edges along the path g ( p ) = � k i =1 w ( e i ) • Probability of the best path between two vertices P bp = p ∈ Pa ( G , o , o ′ ) g ( p ) max ���� ���� D ���� ���� 0.54 � � � � � � � � � � � B 0.62 � � � � 0.51 � ���� ���� ���� ���� � � � � � � � � � � C F � � 0.55 � � � � 0.83 � � � ���� ���� � ���� ���� � � � � � � � � � � 0.71 � � A E � � ������ � � � � � � Laura Langohr Finding representatives in a heterogeneous network

  11. Outline Introduction K -medoids Experiments Future Work Conclusion Algorithm 1. Calculate similarity matrix 2. Choose k objects randomly as initial medoids 3. Assign each remaining object to the most similar medoid 4. Calculate new medoid for each cluster P bp ( G , o , o ′ ) medoid ( C j ) = argmax � o ∈ C j o ′ ∈ C j o ′ � = o Repeat steps 3. and 4. until clustering converges Laura Langohr Finding representatives in a heterogeneous network

  12. Outline Introduction K -medoids Experiments Future Work Conclusion Biomine • 12 biological databases are integrated • Over 1 million vertices • Over 9 million edges �� �� �� �� Gene:434 � 0.54 � � �� �� � � �� �� � � � � � � Pathway:04916 0.62 � � � � � 0.51 � � � � � � �� �� �� �� �� �� �� �� � � � � � � � Gene:4157 Gene:4948 � � � 0.55 � � � � � � � � 0.83 � � � � � � � � � � � �� � �� � �� �� �� �� � �� �� 0.71 � � � � � � Phenotype:203200 Gene:7299 � ��������� � � � � � � � � � � http://biomine.cs.helsinki.fi Laura Langohr Finding representatives in a heterogeneous network

  13. Outline Introduction K -medoids Experiments Future Work Conclusion Artificial example • Three phenotypes, for each three genes • k -medoids with nine genes, and k = 3 Laura Langohr Finding representatives in a heterogeneous network

  14. Outline Introduction K -medoids Experiments Future Work Conclusion Result Laura Langohr Finding representatives in a heterogeneous network

  15. Outline Introduction K -medoids Experiments Future Work Conclusion Future Work • Hierarchical clustering • Statistical evaluation • Comparison to an existing method Laura Langohr Finding representatives in a heterogeneous network

  16. Outline Introduction K -medoids Experiments Future Work Conclusion Conclusion • Finding representative vertices, e.g. genes • K -medoids on Biomine • Example with nine genes is promising Laura Langohr Finding representatives in a heterogeneous network

Recommend


More recommend