hardware accelerated similarity search
play

Hardware Accelerated Similarity Search George Williams Who Am I? - PowerPoint PPT Presentation

Hardware Accelerated Similarity Search George Williams Who Am I? Director, GSI Technology Previously, Chief Data Scientist Senior Data Scientist AI Research Scientist Software Engineer Recent Headlines Convergence and Integration That was


  1. Hardware Accelerated Similarity Search George Williams

  2. Who Am I? Director, GSI Technology Previously, Chief Data Scientist Senior Data Scientist AI Research Scientist Software Engineer

  3. Recent Headlines

  4. Convergence and Integration That was then...

  5. This is Now: Technology Disintegration

  6. More Innovation Around The Corner ?

  7. GSI’s Similarity Search Accelerator

  8. Agenda ● Chip Explosion ● GSI Technology ● What is Vector Similarity Search? ● GSI’s Similarity Search Accelerator ● Integration Case Studies: Bio, Database ● Early Adopters Program

  9. Who Is GSI Technology?

  10. What We Do High Performance SRAM and DRAM Aerospace, Government, R&D GSI Vector Similarity Search Accelerator Chip

  11. What Is Vector Similarity Search?

  12. What is Vector Similarity Search? Numeric Representation Bit-vector 0110000100 Coordinates (2.3, 5.6)

  13. What is Vector Similarity Search? Numeric Representation Simple “Distance” Function d = Func (a, b)

  14. What is Vector Similarity Search? K = 5 Numeric Representation K = 3 Simple “Distance” Function K Nearest Neighbor (Top-K)

  15. What is Vector Similarity Search? Numeric Representation K = 5 Simple “Distance” Functions K = 3 K Nearest Neighbor (Top-K) Search is Computational E-Commerce, Bioinformatics

  16. E-Commerce: Visual Search

  17. Visual Search Binary Codes, Continuous Embeddings Euclidean, L1, Hamming, Cosine >1 Billion Images

  18. Visual Search: Embedding Space

  19. Bioinformatics: Molecule Similarity Fingerprints Tanimoto Many Large DBs 100s GB

  20. Molecule Similarity: Tanimoto Jaccard Intersection / Union 


  21. Bioinformatics: Molecule Similarity Drug Discovery of Novel Molecules Virtual Screening Activity (Toxicity) Prediction

  22. Programming Interfaces Idiomatic SQL Integrate Into Data Pipelines Leverage Skills of Data Eng & Scientists

  23. Many Domains and Applications E-Commerce / Recommendations Bioinformatics / Genomics Healthcare / Medical Records Cybersecurity / Malware Detection Computer Vision / Video Surveillance

  24. GSI’s Similarity Search Accelerator

  25. Computational Memory

  26. “In-Place” Associative Processing Bit Logic ● Programmable ● 2 million

  27. Consumer Board Solution PCIe Card 2 Chips Per Board 16GB Memory On Board DDR4 Main Memory 128Mb 128Mb SRAM Cache Per Chip

  28. 1 Chassis (4U) Solution 4 Boards Per Chassis Chassis PCIe Card PCIe Card PCIe Card PCIe Card 16GB Memory 16GB Memory 16GB Memory 16GB Memory 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb

  29. Multiple Chassis Solution One Chassis Is The Master

  30. Network Attached Storage Network Chassis PCIe Card PCIe Card PCIe Card PCIe Card ... .. 16GB Memory 16GB Memory 16GB Memory 16GB Memory 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb RDMA support (NAS As Data Source)

  31. Segmentation by Clustering Offline Clustering / K-Means Avoids Full DB “Scan” Faster Performance

  32. Availability PCIe Card Q4, 2018 chip 16GB Memory Q1, 2019 128Mb 128Mb demo boards Q2, 2019 mass production

  33. Weizmann Institute Case Study “GSI’s [Accelerator] can dramatically reduce the time required to search our small molecules database...” - Dr. Efrat Ben-Zeev, Computational Chemist

  34. Weizmann Institute Case Study Molecule Similarity Search Biovia Pipeline Pilot Application Query of 34M Molecule DB Takes 10 Minutes ! Using GSI Accelerator (estimated) Query Latency Reduced To 300ms 400 Queries In 1 Second

  35. Biovia Application Integration Application Python Library C Library DRIVERS GSI Accelerator

  36. In-Memory Database Integration Database C Library DRIVERS GSI Accelerator

  37. In-Memory: Expected Performance Memory Speed Vector Size Throughput (imgs/ sec) MemSQL ~50GB / sec 4 KB 12.5 million images/sec GSI ~100GB / sec 4 KB 25 million images/sec

  38. Early Adopters Program Consult With Our Hardware and AI Experts Co-Development and App Integration Access to simulator and test hardware Co-Marketing Opportunity

  39. GSI Upcoming Events Nov, Open Data Science Panel, Visual Search Nov, PyData ( Washington DC ) Dec, GSI Similarity Search Accelerator Workshop Coming Soon, GSI’s Tech Meetup 2019, First Chips and Boards Available

  40. Contact Us Twitter: @gsitechnology 
 @cgeorgewilliams Blogs: gsitechnology.com 
 medium.com/gsitechnology Email: associativecomputing@gsitechnology.com

  41. The End. Thanks !

  42. Query Option 1: INFERENCE IN PRODUCTION ● Image2Vec (VGG, Resnet) ● NLP ● LSTN Option 2: Query Vector done by External Application ● Fingerprint Query ● In Memory Vector (Cosine) Option 3: 3rd Party Inference Vector

  43. Single Board PCIe Card Small Database Fit All Data Into Cache For Lowest Latency 16GB Memory If Larger, Paging Occurs To Memory 128Mb 128Mb Cluster Techniques

  44. Large Database Chassis PCIe Card PCIe Card PCIe Card PCIe Card ... 16GB Memory 16GB Memory 16GB Memory 16GB Memory 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb Large Database (<1TB, Flat) Pharma, Drug Search, Weizmann Molecule Search, In-Memory

  45. Multi Board Solutions Chassis PCIe Card PCIe Card PCIe Card PCIe Card ... 16GB Memory 16GB Memory 16GB Memory 16GB Memory 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb Chassis Host Master 
 For Huge Databases ( ~ 1TB ) Merges The Results For Throughput: Batch Queries Split Across Boards

  46. Offline Data Preparation Optimize For Training Inference Cache and Memory

  47. Clustering For Large Databases Offline Clustering Centroids List <16GB Reduces Storage Only Centroids Are Kept Local For Real-Time Performance

  48. Huge Database Chassis PCIe Card PCIe Card PCIe Card PCIe Card ... 16GB Memory 16GB Memory 16GB Memory 16GB Memory 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb 128Mb Large Scale Sim Search, FAISS Exact Nearest Neighbors For Small Vectors Approx For Large Vectors (Quantization)

  49. Biovia Application Integration Biovia Application Used By Thousands of Bio-Tech Companies

  50. Weizmann: Load A Database

  51. Weizmann: 3rd Party Search

  52. Weizmann: Select Search Method

  53. Weizmann: Search

  54. Weizmann: Define Parameters

  55. Weizmann: Run Protocol

Recommend


More recommend