hashgraph semantic hashing using external knowledge base
play

HashGraph : Semantic Hashing using external knowledge base. C. - PowerPoint PPT Presentation

HashGraph : Semantic Hashing using external knowledge base. C. Gravier 1 , J. Subercaze 1 1 Satin team, LT2C laboratory Universit e Jean Monnet ecom Saint- T el Etienne, France C. Gravier, J. Subercaze (Universities of) HashGraph :


  1. HashGraph : Semantic Hashing using external knowledge base. C. Gravier 1 , J. Subercaze 1 1 Satin team, LT2C laboratory Universit´ e Jean Monnet ecom Saint-´ T´ el´ Etienne, France C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 1 / 43

  2. Preambule Outline Preambule 1 Semantic Hashing 2 Introduction Existing solutions HashGraph 3 User profile : graph of terms Graph to binary footprint Evaluation HashGraph and HashWordnet 4 On hashing node values Using an exertnal is-a taxonomy Demos 5 C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 2 / 43

  3. Preambule References This presentation is based on : ◮ [BambaCIKM12] : Bamba P., Subercaze J., Gravier C., Benmira N., Fontaine J., The Twitaholic Next Door, Proc. of 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp.2275–2278, Maui, Hawai’i, USA, October, 30th 2012 ◮ [SubercazeWI13] : Subercaze J., Gravier C., HashGraph : an expressive and scalable Twitter users profile for recommendation, 2013 IEEE/WIC/ACM International Conference on Web Intelligence (WI’13), Atlanta, USA, November 17th–20th, 2013 .. with a different agenda, additional informations and thoughts. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 3 / 43

  4. Preambule Who are we ? ◮ Christophe Gravier ◮ Associate Professor in Computer Science ecom Saint-´ ◮ Working at T´ el´ Etienne (Universit´ e Jean Monnet) ◮ Julien Subercaze ◮ Researcher in Computer Science ecom Saint-´ ◮ Working at T´ el´ Etienne (Universit´ e Jean Monnet) ◮ Contacts : ◮ mail: { julien.subercaze,christophe.gravier } @univ-st-etienne.fr ◮ homepage : http://satin-ppl.telecom-st-etienne.fr/cgravier/ and http://satin-ppl.telecom-st-etienne.fr/jsubercaze/ ◮ twitter : @chgravier and @JulienSubercaze C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 4 / 43

  5. Semantic Hashing Outline Preambule 1 Semantic Hashing 2 Introduction Existing solutions HashGraph 3 User profile : graph of terms Graph to binary footprint Evaluation HashGraph and HashWordnet 4 On hashing node values Using an exertnal is-a taxonomy Demos 5 C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 5 / 43

  6. Semantic Hashing Introduction Hashing techniques for Information Retrieval ◮ Methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space [Kim and Choi, 2011]. ◮ Usually the hash space is an ”absolute partitioning of the space of document representation” [Stein and Potthast, 2007] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 6 / 43

  7. Semantic Hashing Introduction Hashing techniques for Information Retrieval ◮ Methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space [Kim and Choi, 2011]. ◮ Usually the hash space is an ”absolute partitioning of the space of document representation” [Stein and Potthast, 2007] Historically, learn h φ that partitions the Hamming space so that two documents that are at least close to θ threshold of similarity in the original space, are associated to the same Figure: Hashing for information bucket in the Hamming space. retrieval (From [Stein and Potthast, 2007]) C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 6 / 43

  8. Semantic Hashing Introduction Semantic hashing Similarity Search ◮ In similarity search, a document is used as the query ◮ This is fundamentally different with the standard keyword search paradigm, e.g., in TREC [Zhang et al., 2010]. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 7 / 43

  9. Semantic Hashing Introduction Semantic hashing Similarity Search ◮ In similarity search, a document is used as the query ◮ This is fundamentally different with the standard keyword search paradigm, e.g., in TREC [Zhang et al., 2010]. Semantic Hashing Semantic hashing is about providing the h φ function(s) for providing an index in the Hamming space for fast similarity search . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 7 / 43

  10. Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

  11. Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

  12. Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 2. ǫ − kNN search: Find all documents p , d ( q , p ) ≥ ( 1 + ǫ ) × d ( q , P ) , where d ( q , P ) is the distance of q to the its closest point in P (Hamming ball of size ǫ ) 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

  13. Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 2. ǫ − kNN search: Find all documents p , d ( q , p ) ≥ ( 1 + ǫ ) × d ( q , P ) , where d ( q , P ) is the distance of q to the its closest point in P (Hamming ball of size ǫ ) Remark on Perfect Semantic Hashing It is possible to provide a perfect hashing scheme [Linial et al., 1995], but at a prohibitive code length cost. All semantic hashing schemes try to provide either an approximation (which means hashing with semantic-relatedness preservation guarantees) or a heuristic. 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

  14. Semantic Hashing Introduction A ”good” Semantic Hashing function ? C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

  15. Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

  16. Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

  17. Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. 3. Monotonicity . The quality of the embedding should improve with the increase of bits dedicated to the array of bits. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

  18. Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. 3. Monotonicity . The quality of the embedding should improve with the increase of bits dedicated to the array of bits. 4. Independance to dimensions [Stein and Potthast, 2007]. As most approaches relies on embedding a high dimensional space of dimension d into a Hamming space of dimension d ′ , the semantic hashing strategy should scale well w.r.t. to the increase of d . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

Recommend


More recommend