SASI, Cassandra on the full text search ride DuyHai DOAN Apache - PowerPoint PPT Presentation

SASI, Cassandra on the full text search ride DuyHai DOAN – Apache Cassandra™ Evangelist

1 5 minutes introduction to Apache Cassandra™ 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Some benchmarks 7 Take away @doanduyhai 2

Trademark Policy From now on … Cassandra ⩵ Apache Cassandra™ @doanduyhai 3

5 minutes introduction to Apache Cassandra™ @doanduyhai

The tokens Random hash of #partition à token = hash( #p ) C * C * Hash: ] –x, x ] C * C * hash range: 2 64 values x = 2 64 /2 C * C * C * C * @doanduyhai 5

Token ranges ⎤ ⎤ ⎤ ⎤ A : − x , − 3 x E : 0 , x B C ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 4 4 ⎦ ⎦ ⎦ ⎦ ⎤ ⎤ ⎤ ⎤ B : − 3 x , − 2 x F : x , 2 x ⎥ ⎥ ⎥ ⎥ A D ⎥ ⎥ ⎥ ⎥ 4 4 4 4 ⎦ ⎦ ⎦ ⎦ ⎤ ⎤ ⎤ ⎤ C : − 2 x , − x G : 2 x , 3 x ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 4 4 4 4 ⎦ ⎦ ⎦ ⎦ H E ⎤ ⎤ ⎤ ⎤ D : − x H : 3 x , 0 , x ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 4 4 ⎦ ⎦ ⎦ ⎦ G F @doanduyhai 6

Distributed tables CREATE TABLE users( B C user_id int, …, PRIMARY KEY( user_id ) A D ), user_id 1 user_id 2 H E user_id 3 user_id 4 user_id 5 G F @doanduyhai 7

Distributed tables user_id 3 B C user_id 1 A D user_id 4 user_id 2 H E user_id 5 G F @doanduyhai 8

Coordinator node 2 3 Responsible for handling requests (read/write) B C Every node can be coordinator 1 A D • masterless • no SPOF • proxy role H E request coordinator G F @doanduyhai 9

� � Q & A @doanduyhai 10

SASI introduction @doanduyhai

What is SASI ? • S STable- A ttached S econdary I ndex à new 2 nd index impl that follows SSTable life-cycle • Objective: provide more performant & capable 2 nd index @doanduyhai 12

Who created it ? Open-source contribution by an engineers team @doanduyhai 13

Why is it better than native 2 nd index ? follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized • new data-strutures • range query (<, ≤ , >, ≥ ) possible • full text search options • @doanduyhai 14

Demo @doanduyhai 15

SASI cluster-wide @doanduyhai

Distributed index On cluster level, SASI works exactly like native 2 nd index B C UK user 87 user 176 … user 987 A D UK user 1 user 102 … user 493 UK user 17 user 409 … user 787 US user 54 user 483 … user 938 H E G F @doanduyhai 17

Distributed search algorithm B C A D 1 st round Concurrency factor = 1 H E coordinator G F @doanduyhai 18

Distributed search algorithm B C A D Not enough results ? H E coordinator G F @doanduyhai 19

Distributed search algorithm B C 2 nd round Concurrency factor = 2 A D H E coordinator G F @doanduyhai 20

Distributed search algorithm B C A D Still not enough results ? H E coordinator G F @doanduyhai 21

Distributed search algorithm B C A D 3 rd round Concurrency factor = 4 H E coordinator G F @doanduyhai 22

Concurrency factor formula • more details at: http://www.doanduyhai.com/blog/?p=13191 @doanduyhai 23

Caveat 1: non restrictive filters B C Hit all nodes A D eventually L H E coordinator G F @doanduyhai 24

Caveat 1 solution : always use LIMIT B C SELECT * FROM … A D WHERE ... LIMIT 1000 H E coordinator G F @doanduyhai 25

Caveat 2: 1-to-1 index ( user_email ) B C A D WHERE user_email = ‘xxx' Not found H E coordinator G F @doanduyhai 26

Caveat 2: 1-to-1 index ( user_email ) B C A D WHERE user_email = ‘xxx' H E Still no result coordinator G F @doanduyhai 27

Caveat 2: 1-to-1 index ( user_email ) B C A D WHERE user_email = ‘xxx' At best 1 user found At worst 0 user found H E coordinator G F @doanduyhai 28

Caveat 2 solution: materialized views For 1-to-1 index/relationship, use materialized views instead CREATE MATERIALIZED VIEW user_by_email AS SELECT * FROM users WHERE user_id IS NOT NULL and user_email IS NOT NULL PRIMARY KEY (user_email, user_id) But range queries ( <, >, ≤ , ≥ ) not possible … @doanduyhai 29

Caveat 3: fetch all rows for analytics use-case B C A D Client H E coordinator G F @doanduyhai 30

Caveat 3 solution: use co-located Spark B C Local index query A D Local index filtering in Cassandra Aggregation in Spark H E G F @doanduyhai 31

SASI local read/write path @doanduyhai

SASI Life-cycle: in-memory ACK the client MemTable MemTable MemTable 2 . . . Table 1 Table 2 Table N Memory Index Index Index 3 . . . MemTable 1 MemTable 2 MemTable N 1 Commit log 1 Commit log 2 . . . Commit log n @doanduyhai 33

Local write path data structures Index mode, data type Data structure Usage PREFIX , text Guava ConcurrentRadixTree name LIKE 'John%' name LIKE ’%John%' CONTAINS , text Guava ConcurrentSuffixTree name LIKE ’%ny’ age = 20 PREFIX , other JDK ConcurrentSkipListSet age >= 20 AND age <= 30 age = 20 SPARSE , other JDK ConcurrentSkipListSet age >= 20 AND age <= 30 suitable for 1-to-N index with N ≤ 5 @doanduyhai 34

SASI Life-cycle: flush to SSTable Memory Table 1 Table 2 Table 3 1 Commit log 1 SStable 2 SStable 3 Commit log 2 4 SStable 1 OnDiskIndex 2 OnDiskIndex 3 . . . Commit log n OnDiskIndex 1 @doanduyhai 35

SASI Life-cycle: compaction SSTable 1 SSTable 2 SSTable 3 OnDiskIndex 1 OnDiskIndex 2 OnDiskIndex 3 New SSTable New OnDiskIndex @doanduyhai 36

Local write path summary Index files are built on memtable flush • on compaction flush • To avoid OOM, index files are split into chunk of 1Gb for memtable flush • max_compaction_flush_memory_in_mb for compaction flush • à consequences: SASI has impact on write bandwidth (CPU & disk I/O) @doanduyhai 37

Local read path first, optimize query using Query Planer (see later) • then load chunks (4k) of index files from disk into memory • perform binary search to find the indexed value(s) • retrieve the corresponding partition keys and push them into the Partition • Key Cache à Yes, currently SASI only keep partition key(s) so on wide partition it’s not very optimized ... @doanduyhai 38

OnDiskIndex files SStable 1 user_id 4 FR user_id 1 US user_id 5 FR OnDiskIndex 1 FR US SStable 2 B+Tree-like data structures user_id 3 UK user_id 2 DE OnDiskIndex 2 UK DE @doanduyhai 39

OnDiskIndex Layout Header Data Block Block 4k Multiple of 4k Meta Data Info Pointer Data Block Level Index Levels Pointer Block Count Block Meta Meta Offset Multiple of 4k @doanduyhai 40

Header Block Layout Header Block layout Descriptor Term Min Max Min Max Index Has Version Size Term Term Pk Pk Mode Partial variable short short short short short variable byte @doanduyhai 41

Data Block layout 4k Terms Count Offset Array: [0, 10, 22, …] Term Block Padding TokenTree Block Padding 4k Terms Count Offset Array: [0, 23, 35, …] Term Block Padding TokenTree Block Padding Terms Count Offset Array: [0, 17, 34, …] Term Block Padding TokenTree Block Padding … Terms Count Offset Array: [0, 12, 28, …] Term Block Padding TokenTree Block Padding @doanduyhai 43

Pointer Block building Pointer Root … Root Pointer Block Level Pointer Block N+1 Pointer Block N+2 … Pointer Level 2 LastTerm M+1 LastTerm O LastTerm M 4k Pointer Block 2 … Pointer Block 1 Pointer Block N Pointer Level 1 … LastTerm 1 LastTerm 2 LastTerm N … Data Block 1 Data Block 2 Data Block N Data Level 4k @doanduyhai 45

Binary search using OnDiskIndex files Pointer Root Level Root Pointer Block … Pointer Block Pointer Block Pointer Block Pointer Level 3 … Pointer Level 2 Pointer Block Pointer Block Pointer Block … Pointer Level 1 Pointer Block Pointer Block Pointer Block … Data Level Data Block 1 Data Block 2 Data Block 3 Data Block N @doanduyhai 46

Term Block Binary Search val < Term 100 ? Term 1 Term 25 Term 50 Term 75 Term 100 val > Term 50 ? Term 50 Term 75 Term 100 val < Term 75 ? Term 50 Term 63 Term 75 … val = Term 57 ? Term 57 @doanduyhai 47

Query Planner @doanduyhai

Query planner build predicates tree • predicates push-down & re-ordering • predicate fusions for != operator • @doanduyhai 49

Query optimization example WHERE age < 100 AND fname LIKE 'p%' AND fname != 'pa%' AND age > 21 @doanduyhai 50

Query optimization example AND is associative and commutative @doanduyhai 51

Query optimization example != transformed to exclusion on range scan @doanduyhai 52

Query optimization example AND is associative and commutative @doanduyhai 53

Some benchmarks @doanduyhai

SASI, Cassandra on the full text search ride DuyHai DOAN Apache - PowerPoint PPT Presentation

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5 minutes introduction to Apache Cassandra 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Some

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

ON I AT Pre se nta tio n CRE 4/ 27/ 18 CAMPUS RE COSAF SASI SASI BRE AK DOWN Out o f

SoUNd ride I.D. Ciro Dvila SoUNd ride Concept. Sound Ride is inspired in the SUN RIDE

WOLF Ride FY17 Budget Request wou.edu/wolfride WOU Safe Ride Program: WOLF Ride

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer

full year results full year results full year results full full year results full year results full

AUDITORIUM 5 KM 11 MIN RIDE PARCO DELLA MUSICA 8 KM 15 MIN RIDE GEMELLI HOSPITAL

Arrive-n-Ride Marketing Presentation What is the Arrive -n- Ride Program? A New Innovation

BICYCLE SAFETY KINDERGARTEN-GRADE 2 4 KEY RULES! Wear a Ride with Ride in a Use hand helmet

ride statistics ride statistics resistance variability ride statistics resistance variability

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

BIG RIDE ASSEMBLY AND INSTALLATION INSTRUCTIONS S.R. SMITH BIG RIDE SLIDES ARE

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Query Optimization 2 Instructor: Matei Zaharia cs245.stanford.edu Recap: Data Statistics

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Semanc Indexing

Advanced fulltext search with Sphinx Adrian Nuta // Sphinxsearch // 2014 Fulltext search in

To be or not to be. Neo4j Full Text Search Tips and Tricks Christophe Willemsen CTO -

Scalable Full-Text Search for Petascale File Systems Andrew W. Leung Ethan L. Miller

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Beyond full-text searches With Lucene and Solr Bertrand Delacrtaz ApacheCon EU 2007, Amsterdam

Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17

SASI, Cassandra on the full text search ride DuyHai DOAN Apache - PowerPoint PPT Presentation

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5 minutes introduction to Apache Cassandra 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Some

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

ON I AT Pre se nta tio n CRE 4/ 27/ 18 CAMPUS RE COSAF SASI SASI BRE AK DOWN Out o f

SoUNd ride I.D. Ciro Dvila SoUNd ride Concept. Sound Ride is inspired in the SUN RIDE

WOLF Ride FY17 Budget Request wou.edu/wolfride WOU Safe Ride Program: WOLF Ride

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer

full year results full year results full year results full full year results full year results full

AUDITORIUM 5 KM 11 MIN RIDE PARCO DELLA MUSICA 8 KM 15 MIN RIDE GEMELLI HOSPITAL

Arrive-n-Ride Marketing Presentation What is the Arrive -n- Ride Program? A New Innovation

BICYCLE SAFETY KINDERGARTEN-GRADE 2 4 KEY RULES! Wear a Ride with Ride in a Use hand helmet

ride statistics ride statistics resistance variability ride statistics resistance variability

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Semantic Full-Text Search Semantic Full Text Search Talk @ SIGIR JIWES Talk @ SIGIR

BIG RIDE ASSEMBLY AND INSTALLATION INSTRUCTIONS S.R. SMITH BIG RIDE SLIDES ARE

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Query Optimization 2 Instructor: Matei Zaharia cs245.stanford.edu Recap: Data Statistics

}w !&quot;#$%&amp;'()+,-./012345&lt;yA| Illustraons by Ji Franek. Semanc Indexing

Advanced fulltext search with Sphinx Adrian Nuta // Sphinxsearch // 2014 Fulltext search in

To be or not to be. Neo4j Full Text Search Tips and Tricks Christophe Willemsen CTO -

Scalable Full-Text Search for Petascale File Systems Andrew W. Leung Ethan L. Miller

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Beyond full-text searches With Lucene and Solr Bertrand Delacrtaz ApacheCon EU 2007, Amsterdam

Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Semanc Indexing