Result Clustering for Keyword Search on Graphs Madhulika Mohanty Supervisor: Dr Maya Ramanath
● Common data formats across the Web ● Easily interpretable by machines → “Web of data”
LINKED DATA ● Collection of knowledge bases. ● All the knowledge bases are interlinked. ● Represented as RDF. ● RDF : Resource Description Framework ● Data model to represent structured data ● Triples: <subject> <predicate> <object> ● Example: <Tom_Hanks> <ActedIn> <Cast_Away> ActedIn Tom Hanks Cast Away <Tom_Hanks> <ActedIn> <Forrest_Gump> Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Sample YAGO graph 1 1 http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
Querying graphs ● SPARQL queries – structured queries – Structured results – eg. Graph databases like Neo4j ● Natural Language queries → SPARQL → Structured results ● Relationship queries – unstructured text
Relationship queries ● Unstructured text, like Google. ● Answers are relationships among queried entities. ● More popularly known as “Keyword Search”. ● Why Keyword Search? – Make graphs query-able by casual users. – Find interesting relationships – even surprise discoveries.
Jeff Weiner Mark Zuckerberg
I bet you know this.. Jeff Weiner Mark Zuckerberg
Now that's interesting!! Jeff Weiner Mark Zuckerberg
Another interesting one.. Mausam Nobel Prize winner - Edwin G. Krebs Bill Gates 14th Dalai Lama
Another interesting one.. Mausam Doctorate Faculty Honorary Doctorate Honorary Nobel Prize winner - Doctorate Edwin G. Krebs Bill Gates 14th Dalai Lama
Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor
Movie dataset graph Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara Searching for IsA A I 'Hanks Wright' s s A I IsA Actor
Movie 1994 2011 2000 2006 IsA IsA A IsA r s InYear I InYear a A r e a r s Y a I e Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In I Acted In n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor
Movie 1994 2011 2000 2006 IsA IsA A IsA s r InYear I InYear a A r e a r s Y a e I Y e n Y I n I n I The Girl with the Forrest Gump Larry Crowne Cast Away Casino Royale Dragon Tattoo A c t ActedIn e d Acted In Acted In Acted In I n n Acted In n I I d d e e t t c c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara I IsA s IsA A IsA Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. Actor
Movie 1994 2011 2000 InYear 2006 A A s IsA I InYear s InYear IsA I InYear InYear IsA The Girl with the Forrest Gump Acted In Larry Crowne Cast Away Casino Royale Dragon Tattoo ActedIn n n Acted In I n Acted In I Acted In I d d e d e t e c t c t A c A A Tom Hanks Robin Wright Daniel Craig Rooney Mara IsA A I s s A I IsA ● Results are trees. ● There should exist interconnection between all pairs of keyword nodes. Actor
Keyword Search in a Graph structured data Query Given a set of query keywords, Q = k 1 ,k 2 , ..... ,k n and a graph G =( V , E ) ; find top- K minimal answer trees A 1 , A 2 , .... , A k ordered by their relevance score.
Research Areas Query
Research Areas Query Efficiency
Research Areas Query Efficiency ● Ranking of results ● Quality of results
Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results
Research Areas Query User experience Efficiency ● Ranking of results ● Quality of results
Searching for 'Rekha Bachchan'
Searching for 'Rekha Bachchan' 18 such results
Searching for 'Rekha Bachchan' 18 such results Different contexts
User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information.
User experience ● All kinds of results shown. ● Multiple results of same type. Eg. Amitabh and Rekha were co-actors in multiple movies. – Most of them ranked high. – User is forced to scroll through all before finding new answers. ● Results with different contexts. – User might completely miss some information. ● One way to deal with it – Clustering similar results.
Result clustering ● Cluster similar results together. ● Rank the clusters. ● Show one representative per cluster (Highest Ranked Tree). – User may click it and see all results. ● Advantages: – Can be used with any existing Keyword Search algorithm. – Provides user with a bird's eye view over the results. – Easy to analyze interesting patterns.
Result clustering (contd.) Isomorphism Tree Edit distance Language Model based based (LM) based ● Cluster isomorphic ● Clustering based on tree- ● Agglomerative Complete Link trees together. edit distance with a similarity Clustering ● Two trees need to threshold of 0.9 ● Each tree represented as a have exact same ● Cannot differentiate LM. structure to be different contexts like the ● JS Divergence as similarity clustered together. “Amitabh Bachchan” and measure. ● Ends up generating “Bol Bachchan” case. too many clusters.
Clustering Quality measure: User evaluation ● Dataset: IMDB ● User evaluations over 20 manually selected queries. – Varying from 2-6 keywords in each. ● User was not aware of the underlying technique. ● Asked to rate on a scale of 1-5: – How similar trees are within a cluster? – How dissimilar trees are between different clusters?
Thank you
Recommend
More recommend