result clustering for keyword search on graphs
play

Result Clustering for Keyword Search on Graphs Madhulika Mohanty - PowerPoint PPT Presentation

Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath Common data formats across the Web Easily interpretable by machines Web of data LINKED DATA Collection of knowledge bases.


  1. Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath

  2. ● Common data formats across the Web ● Easily interpretable by machines → “Web of data”

  3. LINKED DATA ● Collection of knowledge bases. ● All the knowledge bases are interlinked. ● Represented as RDF. ● RDF : Resource Description Framework ● Data model to represent structured data ● Triples: <subject> <predicate> <object> ● Example: <Tom_Hanks> <ActedIn> <Cast_Away> ActedIn Tom Hanks Cast Away <Tom_Hanks> <ActedIn> <Forrest_Gump> Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

  4. Sample YAGO graph 1 1 http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/

  5. Querying graphs ● SPARQL queries – structured queries – Structured results – eg. Graph databases like Neo4j ● Natural Language queries → SPARQL → Structured results ● Relationship queries – unstructured text

  6. Relationship queries ● Unstructured text, like Google. ● Answers are relationships among queried entities. ● More popularly known as “Keyword Search”. ● Why Keyword Search? – Make graphs query-able by casual users. – Find interesting relationships – even surprise discoveries.

  7. Jeff Weiner Mark Zuckerberg

  8. I bet you know this.. Jeff Weiner Mark Zuckerberg

  9. Now that's interesting!! Jeff Weiner Mark Zuckerberg

  10. Another interesting one.. Mausam Nobel Prize winner - Edwin G. Krebs Bill Gates 14th Dalai Lama

  11. Another interesting one.. Mausam Doctorate Faculty Honorary Doctorate Honorary Nobel Prize winner - Doctorate Edwin G. Krebs Bill Gates 14th Dalai Lama

  12. Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor

  13. Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara Searching for IsA A I 'Hanks Wright' s s A I IsA Actor

  14. Movie 1994 2011 2000 2006 IsA IsA A IsA r s InYear I InYear a A r e a r s Y a I e Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In I Acted In n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor

  15. Movie 1994 2011 2000 2006 IsA IsA A IsA s r InYear I InYear a A r e a r s Y a e I Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In Acted In I n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor

  16. Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor

  17. Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. Actor

  18. Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. ● There should exist interconnection between all pairs of keyword nodes. Actor

  19. Keyword Search in a Graph structured data Query Given a set of query keywords, Q = k 1 ,k 2 , ..... ,k n and a graph G =( V , E ) ; find top- K minimal answer trees A 1 , A 2 , .... , A k ordered by their relevance score.

  20. Research Areas Query

  21. Research Areas Query Efficiency

  22. Research Areas Query Efficiency ● Ranking of results ● Quality of results

  23. Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results

  24. Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results

  25. Searching for 'Rekha Bachchan'

  26. Searching for 'Rekha Bachchan' 18 such results

  27. Searching for 'Rekha Bachchan' 18 such results Different contexts

  28. User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information.

  29. User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information. ● One way to deal with it – Clustering similar results.

  30. Result clustering ● Cluster similar results together. ● Rank the clusters. ● Show one representative per cluster (Highest Ranked Tree). – User may click it and see all results. ● Advantages: – Can be used with any existing Keyword Search algorithm. – Provides user with a bird's eye view over the results. – Easy to analyze interesting patterns.

  31. Result clustering (contd.) Isomorphism Tree Edit distance Language Model based based (LM) based ● Cluster isomorphic ● Clustering based on tree- ● Agglomerative Complete Link trees together. edit distance with a similarity Clustering ● Two trees need to threshold of 0.9 ● Each tree represented as a have exact same ● Cannot differentiate LM. structure to be different contexts like the ● JS Divergence as similarity clustered together. “Amitabh Bachchan” and measure. ● Ends up generating “Bol Bachchan” case. too many clusters.

  32. Clustering Quality measure: User evaluation ● Dataset: IMDB ● User evaluations over 20 manually selected queries. – Varying from 2-6 keywords in each. ● User was not aware of the underlying technique. ● Asked to rate on a scale of 1-5: – How similar trees are within a cluster? – How dissimilar trees are between different clusters?

  33. Thank you

Recommend


More recommend