full text search
play

Full-Text Search Explained Philipp Krenn @xeraa Infrastructure | - PowerPoint PPT Presentation

Full-Text Search Explained Philipp Krenn @xeraa Infrastructure | Developer Advocate ViennaDB Papers We Love Vienna Who uses databases? Who uses search? Databases vs Full-text search But I can do... SELECT * FROM my_table


  1. { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] } }

  2. POST /starwars/_search { "query": { "match": { "quote": { "query": "van", "fuzziness": "AUTO" } } } }

  3. { "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

  4. SELECT * FROM starwars WHERE quote LIKE "?an" OR quote LIKE "V?n" OR quote LIKE "Va?"

  5. Scoring

  6. MongoDB

  7. > db.starwars.find({ $text: { $search: "droid" }}, {score: {$meta: "textScore"}}) { "_id": ObjectId("57f2d54de814412463c3adef"), "quote": "These are not the droids you are looking for.", "score": 0.75 } Fetched 1 record(s) in 14ms

  8. One Term https://github.com/mongodb/mongo/blob/v3.2/src/mongo/db/fts/fts_spec.cpp#L219 double coeff = (0.5 * data.count / numTokens) + 0.5; data.count: matches numTokens: stemmed words

  9. Search for droid "These are not the droids you are looking for." droid look == 1 match, 2 tokens coeff:

  10. Search for father "No. I am your father." father == 1 match, 1 token coeff:

  11. Search for father "Obi-Wan never told you what happened to your father." obi wan never told happen father == 1 match, 6 tokens coeff:

  12. > db.starwars.find({ $text: { $search: "obi-wan" }}, {score: {$meta: "textScore"}}) { "_id": ObjectId("57f2d56fe814412463c3adf0"), "quote": "Obi-Wan never told you what happened to your father.", "score": 1.1666666666666667 } Fetched 1 record(s) in 6ms

  13. Multiple Terms https://github.com/mongodb/mongo/blob/v3.2/src/mongo/db/fts/fts_spec.cpp#L228 score += (weight * data.freq * coeff * adjustment); weight: method parameter data.freq, adjustment: 1

  14. Search for obi-wan obi wan never told happen father == 1 match, 6 tokens coeff:

  15. Search for obi-wan obi wan never told happen father == 1 match, 6 tokens coeff:

  16. Search for obi-wan score: Sum:

  17. Elasticsearch

  18. Term Frequency / Inverse Document Frequency (TF/IDF) Search one term

  19. BM25 https://speakerdeck.com/elastic/ improved-text-scoring-with-bm25

  20. Term Frequency

  21. Inverse Document Frequency

  22. Field-Length Norm

  23. Putting it Together score(q,d) = queryNorm(q) · coord(q,d) · ∑ ( tf(t in d) · idf(t) ² · t.getBoost() · norm(t,d) ) (t in q)

Recommend


More recommend