Extending the Enterprise Data Warehouse with Hadoop Robert - PowerPoint PPT Presentation

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012

Who I Am • Robert Lancaster • Solutions Architect, Hotel Supply Team • rlancaster@orbitz.com • @rob1lancaster • Organizer of Chicago Machine Learning Study Group • Co-organizer of Chicago Big Data. page 2

Launched in 2001 Over 160 million bookings page 3

Some History… page 4

In 2009… • The Machine Learning team is formed to improve site performance. For example, improving hotel search results. • This required access to large volumes of behavioral data for analysis. • Fortunately, the required data was collected in session data stored in web analytics logs. page 5

The Problem… • The only archive of the required data went back about two weeks. Transactional data Non-transactional Data (e.g. bookings) and (e.g. searches) aggregated Non- transactional data Data Warehouse page 6

Hadoop Provided a Solution… Detailed non- transactional data (what every user sees, clicks, etc.) Transactional data (e.g. bookings) and aggregated Non- transactional data Data Warehouse Hadoop page 7

What is Hadoop? • Distributed file system and parallel processing platform. • Open source Apache project created by Doug Cutting. • Modeled on papers published by Google on the Google File System and MapReduce. • Intended to run on a cluster of relatively inexpensive machines (aka commodity hardware). • Bring processing to the data. page 8

The Hadoop Ecosystem Zookeeper & Oozie Sqoop & Flume Pig Hive HBase MapReduce Hadoop Distributed File System page 9

Deploying Hadoop Enabled Multiple Applications… 100.00% Queries 90.00% Searches 80.00% 71.67% 70.00% 60.00% 50.00% 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 page 10

And Useful Analyses… • page 11

But Brought New Challenges… • Most of these efforts are driven by development teams. • The challenge now is unlocking the value of this data for non- technical users. • Support for Hadoop via traditional BI/reporting tools still meager. page 12

BI Vendors Are Working on Hadoop Integration Both big (relatively)… page 13

And small… page 14

In 2011& 2012 • Big Data team is formed under Business Intelligence team at Orbitz Worldwide. • Allows the Big Data team to work more closely with the data warehouse and BI teams. • Reflects the importance of big data to the future of the company. • Our production cluster has grown 40-fold since it was launched. page 15

A View Shared Beyond Orbitz… “We strongly believe that Hadoop is the nucleus of the next -generation cloud EDW …” “…but that promise is still three to five years from fruition.”* *James Kobielus, Forrester Research, “Hadoop, Is It Soup Yet?” page 16

Two Primary Ways We Use Hadoop to Complement the EDW • Extraction and transformation of data for loading into the data warehouse – “ETL”. • Off-loading of analysis from the data warehouse. page 17

ETL Example Proposed Processing Dimensional Raw logs Hadoop model page 18

ETL Example: Click Data Processing Previous Processing in Data Warehouse Data Web Cleansing Server Web ETL Server Web Logs DW (Stored DW Servers procedure) Several hours of processing ~20% original data size page 19

ETL Example: Click Data Processing • Moving to Hadoop: • Removed load from the data warehouse. • Facilitated adding additional attributes for processing. • Allowed processing to be run more frequently. Data Web Server Cleansing Web HDFS Server Web Logs DW (MapReduce) Servers Processing in Hadoop page 20

Analysis Example: Geo-Targeting Ads • Facilitated analysis that allows for more personalized ad content. • Allowed marketing team to analyze over a years worth of search data. • Provided analysis that was difficult to perform in the data warehouse. page 21

Example Processing Pipeline for Web Analytics Data page 22

Example Use Case: Selection Errors page 23

Use Case – Selection Errors: Introduction • Multiple points of entry. • Multiple paths through site. • Goal: tie events together to form picture of customer behavior. page 24

Use Case – Selection Errors: Processing page 25

Use Case – Selection Errors: Visualization page 26

Example Use Case: Beta Data page 27

Use Case – Beta Data: Introduction • Hotel Sort Optimization • Compare A vs. B • Web Analytics Data • What user saw. • How user behaved • Server Log Data • Sorting behavior used. page 28

Use Case – Beta Data Processing page 29

Use Case – Beta Data: Visualization page 30

Example Use Case: RCDC page 31

Use Case – RCDC: Introduction • Understand and improve cache behavior. • Improve “coverage” • Traditionally search 1 page of hotels at a time. • Get “just enough” information to present to consumers. • Increase amount of availability information we have when consumer performs a search. • Data needed to support needs beyond reporting. page 32

Use Case – RCDC: Processing page 33

Use Case – RCDC: Visualization page 34

Conclusions • Hadoop market is still immature, but growing quickly. Better tools are on the way. • Look beyond the usual (enterprise) suspects. Many of the most interesting companies in the big data space are small startups. • Hadoop won’t replace your EDW, but any organization with a large EDW should at least be exploring Hadoop as a complement to their BI infrastructure. page 35

Conclusions • Work closely with your existing data management teams. • Your idea of what constitutes “ big data ” might quickly diverge from theirs. • The flip-side to this is that Hadoop can be an excellent tool to off-load resource-consuming jobs from your data warehouse. page 36

Thank you! Questions? page 37

Extending the Enterprise Data Warehouse with Hadoop Robert - PowerPoint PPT Presentation

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago Machine Learning

Financial Data Financial Data Financial Data Financial Data Warehouse Warehouse Warehouse

Data Warehouse Update March 19, 2019 Agenda Why a data warehouse? Why THIS data

Adit Enterprise. Adit Enterprise. Adit Enterprise. Adit Enterprise. ADIT Enterprise is a

An Overview of Data Warehousing and OLAP T echnology What is a data warehouse? A

Europe Manchester, England North America - Factory Lehi, UT HQ & Warehouse Salt Lake

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

Data Warehouse and OLAP II Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8

Data Warehouse of German Federal Police From Raw Data to Flexible Analytics Data Warehouse

Data Warehouse and Business Intelligence Webinar October 23, 2014 Objectives What is the

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

BI and WIC Data Warehouse Project Overview Reason for the Data Warehouse project EBT

A Data Warehouse-based A Data Warehouse-based Gene Expression Analysis Gene Expression Analysis

Financial Data Financial Data Warehouse Warehouse Some day, on the corporate balance sheet,

DATA WAREHOUSE How Business Intel and Data Warehouse works Information DEG uses Teacher

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm I Midterm I Friday, March

Enterprise Applications Enterprise Systems Enterprise Systems Also called enterprise

The Changing Face of Cyber Risk 1 About Advisen Advisen generates, integrates, analyzes and

UDI Introduction Dennis Black Director, e-Business BD In the following eleven slides, Dennis

Variability vs. Repeatability An Experience Report Jonathan Li On Wing Introduction This talk

The Wang-Fu Hotel and Its Front. The Lobby

Foundations of Financial Engineering Incentive Problems in Corporate Finance Martin B. Haugh

Foundations of Financial Engineering The Black-Scholes Model Martin B. Haugh Department of

Enterprise Risk Management Through Strategic Allocation of Capital Joint Work Jing Ai

Welcome Welcome VP VP Builders! Builders! 3 3 BlueScope Manufacturing BlueScope

Sambuz

Useful Links

Newsletter

Mail Us

Extending the Enterprise Data Warehouse with Hadoop Robert - PowerPoint PPT Presentation

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago Machine Learning

Financial Data Financial Data Financial Data Financial Data Warehouse Warehouse Warehouse

Data Warehouse Update March 19, 2019 Agenda Why a data warehouse? Why THIS data

Adit Enterprise. Adit Enterprise. Adit Enterprise. Adit Enterprise. ADIT Enterprise is a

An Overview of Data Warehousing and OLAP T echnology What is a data warehouse? A

Europe Manchester, England North America - Factory Lehi, UT HQ &amp; Warehouse Salt Lake

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

Data Warehouse and OLAP II Data Warehouse and OLAP II Week 6 1 Team Homework Assignment #8

Data Warehouse of German Federal Police From Raw Data to Flexible Analytics Data Warehouse

Data Warehouse and Business Intelligence Webinar October 23, 2014 Objectives What is the

Data Warehouse Chronic Conditions Data Warehouse 1 Your source for national CMS Medicare and

BI and WIC Data Warehouse Project Overview Reason for the Data Warehouse project EBT

A Data Warehouse-based A Data Warehouse-based Gene Expression Analysis Gene Expression Analysis

Financial Data Financial Data Warehouse Warehouse Some day, on the corporate balance sheet,

DATA WAREHOUSE How Business Intel and Data Warehouse works Information DEG uses Teacher

Data Warehouse and OLAP Data Warehouse and OLAP Week 5 1 Midterm I Midterm I Friday, March

Enterprise Applications Enterprise Systems Enterprise Systems Also called enterprise

The Changing Face of Cyber Risk 1 About Advisen Advisen generates, integrates, analyzes and

UDI Introduction Dennis Black Director, e-Business BD In the following eleven slides, Dennis

Variability vs. Repeatability An Experience Report Jonathan Li On Wing Introduction This talk

The Wang-Fu Hotel and Its Front. The Lobby

Foundations of Financial Engineering Incentive Problems in Corporate Finance Martin B. Haugh

Foundations of Financial Engineering The Black-Scholes Model Martin B. Haugh Department of

Enterprise Risk Management Through Strategic Allocation of Capital Joint Work Jing Ai

Welcome Welcome VP VP Builders! Builders! 3 3 BlueScope Manufacturing BlueScope

Sambuz

Useful Links

Newsletter

Mail Us

Europe Manchester, England North America - Factory Lehi, UT HQ & Warehouse Salt Lake