Elastic Search - Aditi Choksi (EW18455)
Elastic Search • Search engine • Distributed search • Full text Search • Near real time search
Evolution of Data • Size of data being generated and stored has grown exponentially over the past few decades.
Need for Distributed Data Systems Vertical Scaling – increase machine size Horizontal Scaling – add more machines • Elastic search sends a query to every node / machine and then collects and combines the results from them to return to the user.
Elastic Search Cluster Shard Shard Shard Shard 2 Shard 3 Shard 1 Shard 1 Shard 2 Shard 3
Lucene Index Segment
Inverted Indexes Term count Frequency Documents choice 1 3 coming 1 1 contours 2 2, 3 fury 1 2 is 3 1, 2, 3 ours 1 2 the 2 2,3 winter 1 1 yours 1 3 dictionary postings
Inverted Indexes Term count Frequency Documents choice 1 3 coming 1 1 contours 2 2, 3 2 fury 1 2 is 3 1, 2, 3 ours 1 2 the 2 2,3 2, 3 winter 1 1 yours 1 3 dictionary postings
Wild Card Queries • Wild card searches are difficult Term count Frequency • choice 1 These are unindexed queries coming 1 • So searching somethings like *our* requires going contours 2 through all the terms of the index. fury 1 is 3 ours 1 the 2 winter 1 yours 1
Question • Can you think of a way to make queries like *ours Term count Frequency choice 1 efficient? What kind of index can we create? coming 1 contours 2 fury 1 is 3 ours 1 the 2 winter 1 yours 1
Question • Can you think of a way to make queries like *ours Term count Reversed word choice eciohc efficient? What kind of index can we create? coming gnimoc • Reverse Indexing: contours sroutnoc fury yruf is si *ours → sruo* ours srou the eht • search(our*) union search(sruo*) winter retniw yours sruoy
Bottom up • Indexes are immutable, Shard Shard Shard segments are merged and that’s when obsolete Shard 2 Shard 3 Shard 1 entries are cleaned Shard 3 Shard 1 Shard 2
References • [1]Reaz Ahmed, R. Boutaba , 2011 “A Survey of Distributed Search Techniques in Large Scale Distributed Systems”, IEEE Communications Surveys and Tutorials • [2]Enrico Nardelli, Fabio Barillari , 2015, “Distributed Searching of Multi - dimensional Data” • [3] ShaoHua Liu ; Xing Xue, 2016, Distributed Database Query Based on Improved Genetic Algorithm, 3rd International Conference on Information Science and Control Engineering • [4] Clinton Gourmley, Zachary Tong, 2015, ElasticSearch: The Definitive Guide • https://www.youtube.com/watch?v=lWKEphKIG8U
Thanks ☺
Recommend
More recommend