w 012345 ya
play

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. - PowerPoint PPT Presentation

Flexible Similarity Search of Semanc Vectors Using Fulltext Search Engines Michal Rika, Vt Novotn, Petr Sojka; Jan Pomiklek, Radim ehek Masaryk University, Faculty of Informacs, Brno, Czech Republic mruzicka@mail.muni.cz ,


  1. Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines Michal Růžička, Vít Novotný, Petr Sojka; Jan Pomikálek, Radim Řehůřek Masaryk University, Faculty of Informa�cs, Brno, Czech Republic mruzicka@mail.muni.cz , witiko@mail.muni.cz , sojka@fi.muni.cz ; RaRe Technologies honza@rare-technologies.com , radim@rare-technologies.com https://mir.fi.muni.cz/ https://rare-technologies.com/ }w� !"#$%&'()+,-./012345<yA| Illustra�ons by Jiří Franek.

  2. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  3. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  4. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Seman�c Indexing document as a file (e-mail, ���, …), document as Tokenizer Input ���, … DataReader plain text (e.g. ���� (e.g. pdf2text ) Document tokenizer) document as document as a token list a segment list Segmenter Seman�cModeler (e.g. paragraph / Segment2Vec (e.g. T�I��, LSI, deep logical part learning, doc2vec ) [table, formula] segments in segmenter) all documents document as a list of points represen�ng segments Index of Vectors Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  5. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Seman�c Searching with Nuggets query document as a file Query Document Indexing Pipeline query as seman�c vectors doc � doc � nugget � nugget � nugget � doc � Query Nuggets Document Nuggets Similarity Search � ⋅ � seman�c vectors � � �, � � � Candidate Nuggets 3 1 Ranker � � � 2 Results as Sorted Nuggets � � � 3 1 2 Results as Sorted Documents Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  6. • … Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Re-Ranking Techniques 1 Fast: find candidate nuggets via Elas�csearch. 2 Slow but precise: re-rank candidate nuggets with exact similarity metric. • Cosine similarity. • Euclidean similarity. Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  7. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Re-Ranking Techniques 1 Fast: find candidate nuggets via Elas�csearch. 2 Slow but precise: re-rank candidate nuggets with exact similarity metric. • Cosine similarity. • Euclidean similarity. • … Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  8. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  9. • Rounding to two decimal places, string encoded: • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  10. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [�.��, ��.��, �.���] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  11. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ��.�� , �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  12. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ’1’ ��.�� , �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  13. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ’1’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  14. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  15. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1P2’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  16. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1P2’ ��.�� , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  17. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2i0d12’ , ’1P2’ ��.�� , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  18. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2i0d12’ , ’1P2ineg0d13’ , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

Recommend


More recommend