stop stupid fuzzy searches Table of contents 01 Fuzzy search 02 - PowerPoint PPT Presentation

stop stupid fuzzy searches

Table of contents 01 Fuzzy search 02 Smart Query Rewriting 03 Conclusion 04 Surprise

01 Fuzzy search

Why we need it / Distribution of spelling errors 100% 90% 80% 70% 60% 50% 37% 40% 26% 30% 23% 20% 18% 20% 15% 14% 12% 11% 10% 9% 10% 5% 0% Edit distance 0 Edit distance 1 Edit distance 2 Edit distance >2 Singluar & Plural Decomposition Frequency Value/Search

Why we need it / Distribution of spelling errors by device type 100% 90% 80% 70% 60% 50% 40% 34% 27% 30% 25% 23% 17% 16% 20% 14% 11% 10% 10% 7% 6% 10% 0% Insert Delete Replace Transpose Singular & Plural Decomposition Desktop Mobile

Causes of spelling errors query Result size Query-Intent Error-type -spannbettlaken 1% 0 format 4% spann-bettlaken 3% 83 spannbettlacken 13% 56 phonetic 22% spanbettlaken 9% 50 spannbettlaken 61% spannbettllaken 7% 47 typo 8% spammbettlaken 1% 0 Spann bettlaken 4% 43 decomposition 5% Bettlaken zum spannen 1% 0 …42 additional spellings

How it works EditDistance 1 EditDistance 2 GET catalog/products/_search GET catalog/products/_search { { “query”: { “query”: { “fuzzy”: { “fuzzy”: { “title”: { “title”: { “value”: “spannbettlacken”, “value”: “spannbettlacken”, “fuzziness”: 2 “fuzziness”: 1 } } } } } } } } generates generates 835 ~650k candidates candidates

Resulting in / high recall but low precision 1 0,9 0,8 0,7 0,6 Precision (PREC) 0,5 0,4 0,3 0,2 0,1 0 0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 Recall (TPR)

Resulting in / low search throughput ~0.1 seconds for spelling a short word 35000 30000 25000 Searches per Second 20000 15000 10000 5000 0 1 2 3 4 5 6 Query Terms or - term and - term or - fuzzy 2 and - fuzzy 2

Observations + - Searches for all Increased CPU usage possible candidates and query response inside a given edit- time distance Inconsistent and not Natively implemented always relevant results in Elasticsearch and Lucene Skewed search analytics

02 Smart Query rewriting MAKE FUZZY SEARCH AS FAST, EASY AND RELEVANT AS EXACT SEARCH

Our Solution / smart query rewrites Cluster similar spannbettlaken Queries spann-bettlaken spannbettlacken MasterQuery Search Engine schpanbettlaken spannbettlaken spannbettllaken spammbettlaken spanmbettlaken Test & Select MasterQuery spannbettlaken

Our Solution / smart query rewrites Cluster similar spannbettlaken Queries spann-bettlaken Based on deep learning & crafted algorithms we clean and cluster queries with spannbettlacken the same meaning schpanbettlaken We use the concept of controlled precision reduction spannbettllaken spammbettlaken Exact Match spanmbettlaken Fingerprint spannbettlaken Lemmatization & Phonems Fuzzy Match

Our Solution / smart query rewrites Test & Select spannbettlaken MasterQuery spann-bettlaken Based on tracking KPIs and deep learning and spannbettlacken global parameter optimization we schpanbettlaken test & select the query which maximises the spannbettllaken balance between the search result interaction spammbettlaken probability and the economic outcome spanmbettlaken spannbettlaken

CXP search|hub / Query Intelligence Platform Solr Elasticsearch Frontend Search Search Engine FACT-Finder Endpoint Fredhopper Celebros Algolia ACS High performance Data|hub Da Caching & Logging Semantic Query Parsing Site Search Analytics Guided Selling Personalization Sm Smart|Quer uery … Query Segmentation Query Scoping

03 Conclusion

Impact – top-10 ecom player A Uses an already a highly optimized state-of-the-art eCommerce Search solution w/o smart|query w smart|query 140% 130% 120% 110% 100% 90% Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Impact – top-50 ecom player B Uses an optimized SolR implementation w/o smart|query w smart|query 140% 130% 120% 110% 100% 90% Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar

Resulting in / High recall & high precision 1 1 0,95 0,9 0,85 0,99 0,8 Precision (PREC) Recall (TPR) 0,75 0,7 0,98 0,65 0,6 0,55 0,5 0,97 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 3.000.000 3.500.000 4.000.000 4.500.000 5.000.000 Queries Recall (TPR) Precision (PREC)

Resulting in / insane query performance ~0.00005 seconds for spelling a short word – 80 ops/ms 35000 Searches per Seconds search|hub & Elastic 30000 25000 20000 15000 10000 5000 0 1 2 3 4 5 6 Query Terms or - term and - term or - fuzzy 2 and - fuzzy 2

Observations + - more relevant results additional complexity consistent results reduced manual effort for curated search results save CPU usage improved query response time consistent site search analytics

04 Surprise CXP smart|query- PreDictLib fast & accurate spell correction at scale

search|hub -PreDictLib fast & accurate spell correction at scale Qui Quick Highl hlight hts: extremely fast & constant index § access truly language independent edit § distance ability to add records to the index § at runtime without performance decrease based on one of the most efficient spell correction implementations out there called symspell by Wolf Grabe

Symspell/ some Benchmarks Throughput vs Accuracy 100,0% 100% 88,7% 88,7% 88,3% 90% 80% 69,2% 70% 60% 45,8% 50% 40% 30% 20% 10% 2,2% 1,7% 1,0% 1,0% 0% Lucene WordCorrect ElasticSearch No.2 eCommerce No.1 in eCommerce SymSpell WordCorrect Search Search Accuracy Searches/sec

search|hub -PreDictLib fast & accurate spell correction at scale modified edit distance to a • weighted edit distance changed Damerau Levenshtein • distance with a weighted Damerau Levenshtein distance – taking into account keyboard distance re-rank the candidate list by • applying additional similarity algorithms

Search|hub– PreDice(CE) & PreDict(EE) / some Benchmarks Throughput vs. Accuracy 100% 99% 98% 100% 89% 89% 89% 88% 86% 90% 80% 69% 70% 60% 46% 50% 40% 30% 20% 10% 2% 1% 1% 1% 0% Lucene ElasticSearch No.2 No.1 in Symspell CXP PreDict CXP Searchhub WordCorrect WordCorrect eCommerce eCommerce (CE) Search Search Accuracy Searches/sec

what you‘ll get CXP SmartQuery – PreDictLib (CE) fast & accurate spell correction at scale the Lib as Java source § accuracy and benchmark tests § real-life test data § ht https://gi github. b.com/se searchhub/pr preDict

Questions

stop stupid fuzzy searches Table of contents 01 Fuzzy search 02 - PowerPoint PPT Presentation

stop stupid fuzzy searches Table of contents 01 Fuzzy search 02 Smart Query Rewriting 03 Conclusion 04 Surprise 01 Fuzzy search Why we need it / Distribution of spelling errors 100% 90% 80% 70% 60% 50% 37% 40% 26% 30% 23% 20% 18%

On Fuzzy Soft Rings Banu Pazar Varol and Halis Ayg un Department of Mathematics, Kocaeli

Applications Three sample applications Fuzzy inferno Nostalgic cow Twilight Eden Fuzzy inferno

11 Fuzzy Rule-Based Models Fuzzy Systems Engineering Toward Human-Centric Computing Contents

7 Transformations of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric Computing

Jesuss Stupid Disciples Mike Taylor Forest Community Church Sunday 5 May 2019 Stupid

Better to LOOK stupid, than to BE stupid Fred Henry Williams Agile Prague, 2018 Never

Semi-Heuristic Target-Based Fuzzy Target . . . Fuzzy Target . . . Fuzzy Decision Procedures:

M odels for Inexact Reasoning Fuzzy Logic Lesson 8 Fuzzy Controllers M aster in

Searches for Supersymmetry in CMS Introduction Stop searches ~ ~ Direct stop, Razor, monojet t

5 Operations and Aggregations of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric

10 Fuzzy Modeling: Principles and Methodology Fuzzy Systems Engineering Toward Human-Centric

2 Notions and Concepts of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric Computing

Fuzzy Reasoning Outline Introduction Bivalent & Multivalent Logics Fundamental

A fuzzy clustering method using Genetic Algorithm and Fuzzy Subtractive Clustering Thanh Le, Tom

M odels for Inexact Reasoning Fuzzy Logic Lesson 1 Crisp and Fuzzy Sets M aster in

On using Different Distance Measures for Fuzzy Numbers in Fuzzy Linear Regression Models Duygu

PHYSICS OF THE NEUTRINO FACTORY (AND FRIENDS) J.J. Gmez Cadenas IFIC (CSIC-UV) Lecture IV

Statistics and Data Analysis Distributions and Sampling Ling-Chieh Kung Department of

GMBA 7098: Statistics and Data Analysis (Fall 2014) Sampling and Sampling Distributions

Sampling Distributions & Probability Paul Gribble Winter, 2019 . . . . . . . . . .

ACCOUNTS AND AUDIT UPDATE AUTUMN 2016 FOR Guy Loveday 1 WHAT IS COMING UP? RENEGOTIATION OF

Third Quarter 2014 Earnings Presentation October 31, 2014 Agenda Strategic Review Edward Tilly

INTERIM RESULTS PRESENTATION for the six months ended 30 November 2014 1 PRESENTATION OUTLINE

4Q & FY18/19 Financial Results 22 April 2019 Important Notice This presentation shall be

Sambuz

Useful Links

Newsletter

Mail Us

stop stupid fuzzy searches Table of contents 01 Fuzzy search 02 - PowerPoint PPT Presentation

stop stupid fuzzy searches Table of contents 01 Fuzzy search 02 Smart Query Rewriting 03 Conclusion 04 Surprise 01 Fuzzy search Why we need it / Distribution of spelling errors 100% 90% 80% 70% 60% 50% 37% 40% 26% 30% 23% 20% 18%

On Fuzzy Soft Rings Banu Pazar Varol and Halis Ayg un Department of Mathematics, Kocaeli

Applications Three sample applications Fuzzy inferno Nostalgic cow Twilight Eden Fuzzy inferno

11 Fuzzy Rule-Based Models Fuzzy Systems Engineering Toward Human-Centric Computing Contents

7 Transformations of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric Computing

Jesuss Stupid Disciples Mike Taylor Forest Community Church Sunday 5 May 2019 Stupid

Better to LOOK stupid, than to BE stupid Fred Henry Williams Agile Prague, 2018 Never

Semi-Heuristic Target-Based Fuzzy Target . . . Fuzzy Target . . . Fuzzy Decision Procedures:

M odels for Inexact Reasoning Fuzzy Logic Lesson 8 Fuzzy Controllers M aster in

Searches for Supersymmetry in CMS Introduction Stop searches ~ ~ Direct stop, Razor, monojet t

5 Operations and Aggregations of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric

10 Fuzzy Modeling: Principles and Methodology Fuzzy Systems Engineering Toward Human-Centric

2 Notions and Concepts of Fuzzy Sets Fuzzy Systems Engineering Toward Human-Centric Computing

Fuzzy Reasoning Outline Introduction Bivalent &amp; Multivalent Logics Fundamental

A fuzzy clustering method using Genetic Algorithm and Fuzzy Subtractive Clustering Thanh Le, Tom

M odels for Inexact Reasoning Fuzzy Logic Lesson 1 Crisp and Fuzzy Sets M aster in

On using Different Distance Measures for Fuzzy Numbers in Fuzzy Linear Regression Models Duygu

PHYSICS OF THE NEUTRINO FACTORY (AND FRIENDS) J.J. Gmez Cadenas IFIC (CSIC-UV) Lecture IV

Statistics and Data Analysis Distributions and Sampling Ling-Chieh Kung Department of

GMBA 7098: Statistics and Data Analysis (Fall 2014) Sampling and Sampling Distributions

Sampling Distributions &amp; Probability Paul Gribble Winter, 2019 . . . . . . . . . .

ACCOUNTS AND AUDIT UPDATE AUTUMN 2016 FOR Guy Loveday 1 WHAT IS COMING UP? RENEGOTIATION OF

Third Quarter 2014 Earnings Presentation October 31, 2014 Agenda Strategic Review Edward Tilly

INTERIM RESULTS PRESENTATION for the six months ended 30 November 2014 1 PRESENTATION OUTLINE

4Q &amp; FY18/19 Financial Results 22 April 2019 Important Notice This presentation shall be

Sambuz

Useful Links

Newsletter

Mail Us

Fuzzy Reasoning Outline Introduction Bivalent & Multivalent Logics Fundamental

Sampling Distributions & Probability Paul Gribble Winter, 2019 . . . . . . . . . .

4Q & FY18/19 Financial Results 22 April 2019 Important Notice This presentation shall be