Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

    Deal Personalization Systems @ Groupon � Ameya ¡Kanitkar ¡ ameya@groupon.com ¡

Relevance & Personalization Systems @ Groupon �

What are Groupon Deals? �

Our Relevance Scenario � Users ¡

Our Relevance Scenario � Users ¡ How ¡do ¡we ¡surface ¡relevant ¡ deals ¡? ¡ ¡ ¡ ¡ • Deals ¡are ¡perishable ¡(Deals ¡ expire ¡or ¡are ¡sold ¡out) ¡ • No ¡direct ¡user ¡intent ¡(As ¡in ¡ tradiDonal ¡search ¡adverDsing) ¡ • RelaDvely ¡Limited ¡User ¡ InformaDon ¡ • Deals ¡are ¡highly ¡local ¡ ¡ ¡ ¡ ¡ ¡ ¡

Two Sides to the Relevance Problem � Algorithmic ¡ Scaling ¡ Issues ¡ Issues ¡ ¡ ¡ How ¡to ¡find ¡ How ¡to ¡handle ¡ relevant ¡deals ¡for ¡ relevance ¡for ¡ individual ¡users ¡ all ¡users ¡across ¡ given ¡a ¡set ¡of ¡ mulDple ¡ opDmizaDon ¡criteria ¡ delivery ¡plaJorms ¡

Developing Deal Ranking Algorithms � • Exploring Data � Understanding signals, finding ➥ patterns � • Building Models/Heuristics � Employ both classical machine ➥ learning techniques and heuristic adjustments to estimate user purchasing behavior � • Conduct Experiments � Try out ideas on real users and ➥ evaluate their effect �

Data Infrastructure � Growing ¡Deals ¡ Growing ¡Users ¡ 2011 ¡ 2012 ¡ • 100 ¡Million+ ¡subscribers ¡ 2013 ¡ • We ¡need ¡ ¡to ¡store ¡data ¡ like, ¡user ¡click ¡history, ¡ ¡ email ¡records, ¡service ¡ 20+ ¡ logs ¡etc. ¡This ¡tunes ¡to ¡ billions ¡of ¡data ¡points ¡ 400+ ¡ and ¡TB’s ¡of ¡data ¡ 2000+ ¡

Deal Personalization Infrastructure Use Cases � Deliver Personalized Deliver Personalized Website & Mobile Emails � Experience � Email ¡ Personalize ¡billions ¡of ¡emails ¡for ¡hundreds ¡ Personalize ¡one ¡of ¡the ¡most ¡popular ¡ of ¡millions ¡of ¡users ¡ e-‑commerce ¡mobile ¡& ¡web ¡app ¡ for ¡hundreds ¡of ¡millions ¡of ¡users ¡& ¡page ¡views ¡ Offline ¡System ¡ Online ¡System ¡

Earlier System � Email ¡ Online ¡Deal ¡ PersonalizaDon ¡ ¡ Offline ¡ API ¡ PersonalizaDon ¡ Map/Reduce ¡ MySQL ¡Store ¡ Data ¡Pipeline ¡(User ¡Logs, ¡Email ¡Records, ¡User ¡History ¡etc) ¡

Earlier System � • ¡Scaling ¡MySQL ¡for ¡data ¡ such ¡as ¡user ¡click ¡history, ¡ Email ¡ email ¡records ¡was ¡ painful ¡unless ¡we ¡shard ¡ data ¡ Offline ¡ Online ¡Deal ¡ PersonalizaDon ¡ PersonalizaDon ¡ ¡ • ¡Need ¡to ¡maintain ¡two ¡ Map/Reduce ¡ API ¡ separate ¡data ¡pipelines ¡ for ¡essenDally ¡the ¡same ¡ data. ¡ MySQL ¡Store ¡ Data ¡Pipeline ¡

• Common ¡data ¡store ¡that ¡ Ideal System � serves ¡data ¡to ¡both ¡online ¡ and ¡offline ¡systems ¡ • Data ¡store ¡that ¡scales ¡to ¡ Email ¡ hundreds ¡of ¡millions ¡of ¡ records ¡ Offline ¡ Online ¡Deal ¡ PersonalizaDon ¡ • Data ¡store ¡that ¡plays ¡well ¡ PersonalizaDon ¡ ¡ Map/Reduce ¡ API ¡ with ¡our ¡exisDng ¡hadoop ¡ based ¡systems ¡ Ideal ¡Data ¡Store ¡ • Data ¡store ¡that ¡supports ¡get() ¡ put() ¡access ¡paberns ¡based ¡ on ¡a ¡key ¡(User ¡ID). ¡ Data ¡Pipeline ¡

Why HBase? � • Open ¡Source ¡distributed ¡map ¡data ¡store ¡modeled ¡ acer ¡Google’s ¡Big ¡Table ¡ • Distributed ¡Data ¡Store: ¡Store ¡data ¡on ¡1-‑700 ¡node ¡ cluster. ¡Linear ¡scaling. ¡Add ¡capacity ¡by ¡adding ¡more ¡ machines. ¡ • Very ¡light ¡schema. ¡Each ¡row ¡may ¡have ¡any ¡number ¡of ¡ columns. ¡Columns ¡need ¡not ¡be ¡defined ¡upfront. ¡ (Something ¡like: ¡Row1-‑> ¡Map<byte[], ¡byte[]) ¡

Why HBase? � • Consistent ¡Database. ¡Highly ¡available. ¡AutomaDcally ¡ shards/ ¡scales. ¡Can ¡scale ¡to ¡billions ¡of ¡rows ¡and ¡mulD ¡ terabyte ¡data ¡sizes ¡ • Writes ¡: ¡1-‑10 ¡ms, ¡Reads ¡20-‑50 ¡ms ¡ • Tight ¡out ¡of ¡the ¡box ¡integraDon ¡with ¡Hadoop ¡and ¡Map ¡ Reduce ¡

HBase Table � Row ¡ Cf:<qual> ¡ Cf:<qual> ¡ …. ¡ Cf:<qual> ¡ row1 ¡ Cf1:qual1 ¡ Cf1:qual2 ¡ row11 ¡ Cf1:qual2 ¡ Cf1:qual22 ¡ Cf1:qual3 ¡ row2 ¡ Cf2:qual1 ¡ rowN ¡

Architecture Options � Email ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture Options � Pros ¡ • Simple ¡design ¡ • Consolidated ¡system ¡that ¡ Email ¡ serves ¡both ¡online ¡and ¡offline ¡ personalizaDon ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture Options � Cons ¡ • We ¡now ¡have ¡same ¡upDme ¡ SLA ¡on ¡both ¡offline ¡and ¡online ¡ system ¡ Email ¡ • Maintaining ¡online ¡latency ¡ SLA ¡for ¡bulk ¡writes ¡and ¡bulk ¡ reads ¡is ¡hard. ¡ Offline ¡ Online ¡ ¡ PersonalizaDon ¡ And ¡here ¡is ¡why… ¡ PersonalizaDon ¡ Map/Reduce ¡ HBase ¡System ¡ Data ¡Pipeline ¡

Architecture � • We ¡can ¡now ¡ maintain ¡different ¡ Email ¡ SLA ¡on ¡online ¡and ¡ offline ¡systems ¡ Real ¡Time ¡ Relevance ¡ • We ¡can ¡tune ¡HBase ¡ Relevance ¡ Map/Reduce ¡ cluster ¡differently ¡ for ¡online ¡and ¡ offline ¡systems ¡ ReplicaDon ¡ HBase ¡Offline ¡ HBase ¡for ¡Online ¡ System ¡ System ¡ Data ¡Pipeline ¡

HBase Schema Design � User ¡ID ¡ Column ¡Family ¡1 ¡ Column ¡Family ¡2 ¡ Unique ¡IdenDfier ¡for ¡ User ¡History ¡and ¡ Email ¡History ¡For ¡Users ¡ Users ¡ Profile ¡InformaDon ¡ Append ¡email ¡history ¡for ¡ each ¡day ¡as ¡a ¡separate ¡ Overwrite ¡user ¡history ¡ columns. ¡(On ¡avg ¡each ¡row ¡ and ¡profile ¡info ¡ has ¡over ¡200 ¡columns) ¡ • Most ¡of ¡our ¡data ¡access ¡paberns ¡are ¡via ¡“User ¡Key” ¡ • This ¡makes ¡it ¡easy ¡to ¡design ¡HBase ¡schema ¡ • The ¡actual ¡data ¡is ¡kept ¡in ¡JSON ¡

Cluster Sizing � • Machine ¡Profile ¡ HBase ¡ • 96 ¡GB ¡RAM ¡(HBase ¡25 ¡ ReplicaDon ¡ GB) ¡ Hadoop ¡+ ¡ • 24 ¡Virtual ¡Cores ¡CPU ¡ Online ¡HBase ¡ HBase ¡ ¡ Cluster ¡ • 8 ¡2TB ¡Disks ¡ Cluster ¡ • Data ¡Profile ¡ • 100 ¡Million+ ¡Records ¡ 100+ ¡machine ¡Hadoop ¡ • 2TB+ ¡Data ¡ cluster, ¡this ¡runs ¡heavy ¡map ¡ 10 ¡Machine ¡ dedicated ¡HBase ¡ ¡ reduce ¡jobs ¡ • Over ¡4.2 ¡Billion ¡Data ¡ The ¡same ¡cluster ¡also ¡hosts ¡ cluster ¡to ¡serve ¡real ¡ Points ¡ Dme ¡SLA ¡ 15 ¡node ¡HBase ¡cluster ¡

Other Takeaways � • Choose data storage format carefully. (We are using JSON, but one can consider Avro, Protobufs etc) � • Always store compressed data. We use LZO, its easy to map reduce � • Always store processed data in HBase. � • HBase needs some tuning before it scales. Tuning garbage collection is important. So is various timeouts and caching parameters, cluster can be unstable without these tuning parameters. �

Questions? �

QuesYons? ¡ Thanks! ¡ ameya@groupon.com � www.groupon.com/techjobs �

Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

Deal Personalization Systems @ Groupon Ameya Kanitkar ameya@groupon.com Relevance & Personalization Systems @ Groupon What are Groupon Deals? Our Relevance Scenario Users Our Relevance Scenario

Performance Investigation of Different Large Diameter Water Pipe Materials Prepared by: Ameya

Frontiers in E-Commerce Personalization Sri Subramaniam VP, Relevance, Groupon

P o l a r C o d e s o v e r q - a r y A l p h a b e t s a n d P o

VOL Charu Dwivedi, Ameya Khare, and Aman Agrawal Vol is helping save Tesla car owners from

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

Database Operations at Groupon using Ansible Mani Subramanian Sr. Manager Global Database

Web Adaptation and Personalization Marios Belk Outline Overview and Importance of

Automated Fault-Tolerance Testing Ajay Vaddadi, Groupon April 11th, 2016 About Me and My Team

L-functions: structure and tools David Farmer AIM joint work with Sally Koutsoliotas and Stefan

Adaptivity and Personalization in Learning System s Sabine Graf School of Computing and

Mining Minds Interpreter Service Curation Layer MMV-2.5 Overview 2 / Personalization is a

Deep Sea Salvage Operations 1 Dr. Alok K. Verma & Ameya Erande Lean Institute - ODU

This Time, Its Personalized Preparing Your Site for Effective Personalization AGENDA 1.

LGITSA State Conference - Reforms background - P&D Code - Tools - Single Solution Troy

Technologies for Web-based Adaptive Interactive Systems: Personalization Categories, and

DYNAMIC WEBSITE PERSONALIZATION AGENDA Defining dynamic website personalization Why

Recommender Systems: The Power of Personalization Presenter Moderator Dr. Joseph A. Konstan

These slides are available at http://tiny.cc/directedfeedback Overview Personalization

THE POTENTIAL FOR PERSONALIZATION IN WEB SEARCH Susan Dumais, Microsoft Research Sept 30, 2016

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Google News Personalization: Scalable Google News Personalization: Scalable Online Collaborative

PERSONALIZATION Customers expect personalized, tailored messaging no matter which channel or

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Towards Usable Privacy in Cross-System Personalization Yang Wang CMU Usable Privacy and Security

Deal Personalization Systems @ Groupon Ameya Kanitkar - PowerPoint PPT Presentation

Deal Personalization Systems @ Groupon Ameya Kanitkar ameya@groupon.com Relevance & Personalization Systems @ Groupon What are Groupon Deals? Our Relevance Scenario Users Our Relevance Scenario

Performance Investigation of Different Large Diameter Water Pipe Materials Prepared by: Ameya

Frontiers in E-Commerce Personalization Sri Subramaniam VP, Relevance, Groupon

P o l a r C o d e s o v e r q - a r y A l p h a b e t s a n d P o

VOL Charu Dwivedi, Ameya Khare, and Aman Agrawal Vol is helping save Tesla car owners from

Web Personalization &amp; Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

Database Operations at Groupon using Ansible Mani Subramanian Sr. Manager Global Database

Web Adaptation and Personalization Marios Belk Outline Overview and Importance of

Automated Fault-Tolerance Testing Ajay Vaddadi, Groupon April 11th, 2016 About Me and My Team

L-functions: structure and tools David Farmer AIM joint work with Sally Koutsoliotas and Stefan

Adaptivity and Personalization in Learning System s Sabine Graf School of Computing and

Mining Minds Interpreter Service Curation Layer MMV-2.5 Overview 2 / Personalization is a

Deep Sea Salvage Operations 1 Dr. Alok K. Verma &amp; Ameya Erande Lean Institute - ODU

This Time, Its Personalized Preparing Your Site for Effective Personalization AGENDA 1.

LGITSA State Conference - Reforms background - P&amp;D Code - Tools - Single Solution Troy

Technologies for Web-based Adaptive Interactive Systems: Personalization Categories, and

DYNAMIC WEBSITE PERSONALIZATION AGENDA Defining dynamic website personalization Why

Recommender Systems: The Power of Personalization Presenter Moderator Dr. Joseph A. Konstan

These slides are available at http://tiny.cc/directedfeedback Overview Personalization

THE POTENTIAL FOR PERSONALIZATION IN WEB SEARCH Susan Dumais, Microsoft Research Sept 30, 2016

Roosevelt's New Deal Mr. Venezia Roosevelt's New Deal 1 Election of 1932 Roosevelt's New Deal

Google News Personalization: Scalable Google News Personalization: Scalable Online Collaborative

PERSONALIZATION Customers expect personalized, tailored messaging no matter which channel or

6. &quot;Happy Days Are Here Again&quot;: FDR and the New Deal 6.1 FDR and the New Deal 6.2 A

Towards Usable Privacy in Cross-System Personalization Yang Wang CMU Usable Privacy and Security

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

Deep Sea Salvage Operations 1 Dr. Alok K. Verma & Ameya Erande Lean Institute - ODU

LGITSA State Conference - Reforms background - P&D Code - Tools - Single Solution Troy

6. "Happy Days Are Here Again": FDR and the New Deal 6.1 FDR and the New Deal 6.2 A