Leveraging AWS and Machine Learning to Power Search at Zocdoc - PowerPoint PPT Presentation

Leveraging AWS and Machine Learning to Power Search at Zocdoc Pedro Rubio Head of Search Engineering Brian d’Alessandro Head of Data Science This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be reproduced or shared, in whole or in part, without the express written authorization of Zocdoc, Inc.

Agenda - How we’re built - People and Architecture - How we’re built - the Data - Questions

Problem Statements: 1. Patients need to find and book with a doctor, and, 2. Patients don’t often know what kind of doctor they need. 3

How we’re built And solving “what the patient means”

Core Optimization Problems for ZD Search • Cross team collaboration enabling maximum iteration speed • Deliver recommendations < 200 ms • Patient satisfaction (And our architecture plays a big role here!) 6

The Search Team Product Engineering Data Science Design 7

Zocdoc Tech Stack NodeJS, ES6, Babel, React - AWS - Cloudformation, Docker, ECR, - EC2, ELB Kinesis / Firehose - S3 - reporting to data-lake - Monitoring with Datadog - Routes with Express -

Our Legacy Search

Free Text (Patient Powered) Search

Types of Intent Name of doctor Medical procedure Specialty Symptom 11

Intent Parsing Architecture Design Machine Learning 12

Doctor Name Retrieval Specialty Retrieval Browser Visit Reason Retrieval Phase I - Auto-Suggest NLP Search Semantic Retrieval Service Pipeline Semantic Corpus Building Service Service Handler Auto-Suggest Ranking Results Ranking Models Logging Phase II - Backend Search

Solving for the Long Tail The structured queries comprise a reasonable percent of traffic, but are a minority of total search terms we service. We use Natural Language Processing (NLP) algos to map unstructured terms into our structured search set. Specialties = O(10^2), Procedures = O(10^3), Names = O(10^6), Other = O(10^7) 14

Different Representations of Same Concept Variations Concept Medical Term Interpretation • Heart beats too • Irregular • Atrial fast heartbeat Fibrillation • Heart flutters • Heart • Pulse rate too high palpitations • Irregular pulse • Heart out of rhythm • Irregular heartbeat • Heart palpitations 15

ZocDoc Semantic Service f(“presentation anxiety”) = {[{Specialty =“Psychologist”, Relevance = 0.8}, …,{Specialty =“Psychiatrist”, Relevance = 0.7}]} 16

Early Results (And Why You Need to Always Experiment)

Searches that Lead to “Nephrology” Many patients don’t know what a Nephrologist is. They don’t need to know to find one now. 18

How We’re Built - The Data

Data - Indexing so we can Search

Lesson Learned with Indexing Data Legacy Layer AWS Elastic.co Monolith Live Feed Process S3 ƛ Cache Lambda’s act as a mini ETL layer getting the documents ready for our retrieval stage. - Complex “stateless” ETL process that - Lambda memory max 1500mb transforms this data into the data that - Our data much larger we need in Elasticsearch - Manage state in S3 and - Load piecemeal into Elasticsearch Elasticsearch - At the very end, swap alias to use newly uploaded indexes

New ETL - Spark - Spark - ETL - Get over 1500mb limit - Get over 5 minute runtime limit More complex Processing - Easily add more data-sets 1 - Currently in Databricks - Plan to migrate to EMR (Elastic Map Reduce) 2 3 Mapping from Mapping from 1 -> 2 1 -> 3 Joined Data Set Business Logic Application

Data - Event Data So we can Learn

The Marketplace Goal: Make it as easy as possible to match the user to the right doctor. Considerations: • How to weight distance vs. availability vs. experience vs. reviews? • Does Dr. take this type of patient? • Are we meeting regulatory requirements? 24

Organizational Optimization Optimize: algo iteration speed Subject to: • Org too small to justify full time data scientists within search • Throwing models over the wall to be implemented doesn’t work 25

Agile Machine Learning Production Filtered Model API Results ZocDoc Query Model dB Prod Service Transformations + Scored Ranked Model Scoring Results (Search) Results Research, Analysis, Logs Model Development (S3/Redshift) (Spark/Redshift) Offline Engineering Owned DS Owned 26

Aqueduct: Filling the Data Lake Some Data Lake principles: • Allow producers to easily push data • Allow data format changes • Smart ETL to make consumption very easy 27

Cistern: Making Datalake Drinkable • “Raw” data lake good for exploratory research (we use Spark) • “Clean” data lake better for analytics and quick exploration 28

Data - Insights

We’ve got our fingers on Searches for Therapy/Therapist on Zocdoc the pulse of public health trends. 31

How Much is that Smile Worth? Click Conversion by Search Rank and DrIsSmiling We’re exploring AWS Rekognition to research what drives user interest in Dr. profiles. 32

Thank you and Questions!

Leveraging AWS and Machine Learning to Power Search at Zocdoc - PowerPoint PPT Presentation

Leveraging AWS and Machine Learning to Power Search at Zocdoc Pedro Rubio Head of Search Engineering Brian dAlessandro Head of Data Science This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be

Example 1 : Selecting a present for somebody You enter a large shop with no idea what

The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets Matthew Harper

Machine learning techniques to probe theoretical physics Intro In inSPIRE, search find t

Leveraging Machine Learning to Improve Unwanted Resource Filtering Sruti Bhagavatula Christopher

Learning to Search + Recurrent Neural Networks Matt Gormley Lecture 4 Sep. 9, 2019 1

Hyperparameter Search in Machine Learning Marc Claesen and Bart De Moor

An Exercise in An Exercise in Machine Learning Machine Learning

Machine Learning By Alex Scarlatos What is Machine Learning? Machine Learning is the process by

Machine Learning: Study of algorithms that improve their performance P at some task T

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

Role of Pricing in Leveraging Market Power Role of Pricing in Leveraging Market Power Tom Hird

10d Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A

10b Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A

10a Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A

10c Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A

Retrieval as Interaction African Summer School on Machine Learning for Data Mining and Search

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

Signal Processing and Machine Learning for Power Quality Disturbance Detection and Classification

Machine Learning Machine Learning: algorithms that use experience to improve their

Machine Learning for Performance and Power Modeling/Prediction Lizy K. John University of Texas

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

1 Why Study Machine Learning? Why Study Machine Learning? Cognitive Science The Time is Ripe

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

321 [No trash found] Clean up Head for Search [Seen trash] [Recharged] trash Get power [No