Predicting Hotel Cancellations with Machine Learning Michael el - PowerPoint PPT Presentation

Predicting Hotel Cancellations with Machine Learning Michael el Grogan Machine Learning Consultant @ MGCodesandStats michael-grogan.com Big Data Conference Europe 2019 - join at Slido.com with #bigdata2019

Why are hotel cancellations a problem? • Inefficient allocation of rooms and other resources • Customers who would follow through with bookings cannot do so due to lack of capacity • Indication that hotels are targeting their services to the wrong groups of customers

How does machine learning help solve this issue? • Allows for identification of factors that could lead a customer to cancel • Time series forecasts can provide insights as to fluctuations in cancellation frequency • Offers hotel businesses the opportunity to rethink their target markets

Original Authors • Antonio, Almeida, Nunes (2016): Using Data Science to Predict Hotel Booking Cancellations. • This presentation will describe alternative machine learning models that I have conducted on these datasets. • Notebooks and datasets available at: https://github.com/MGCodesandStats.

Three components Identifying important customer features • ExtraTreesClassifier Classifying potential customers in terms of cancellation risk • Logistic Regression, SVM Forecasting fluctuations in hotel cancellation frequency • ARIMA, LSTM

Question What do you think is the most important Python library in a machine learning project?

Answer Oh, really? pandas

Most of the machine learning process… is not machine learning Data Effective Machine Learning Manipulation Analysis

You may have data – but it is not the data you want What we have is a classification set: What we want is a time series:

Data Manipulation with pandas 1. Merge e year r and we week number

Data Manipulation with pandas 2. Merge e dates tes and cancellatio ellation n incidenc ences es

Data Manipulation with pandas 3. Sum we weekly ly cancellat ellation ions and order by date te

Feature Selection – What Is Important? • Of all the potential features, only a select few are important in classifying future bookings in terms of cancellation risk. • ExtraTreesClassifier is used to rank features – the higher the score, the more important the feature – in most cases…

Feature Selection – What Is Important? • Top six features: • Reservation Status (big caveat here) • Country of origin • Required car parking spaces • Deposit type • Customer type • Lead time STATISTICALLY STATISTICALLY INSIGNIFICANT OR SIGNIFICANT AND vs. THEORETICALLY MAKES THEORETICAL REDUNDANT SENSE

Accuracy 90% is great. 100% means you’ve overlooked something. Training accuracy • Accuracy of the model in predicting other values in the training set (the dataset which was used to train the model in the first instance). Validation accuracy • Accuracy of the model in predicting a segment of the dataset which has been “split off” from the training set. Test accuracy • Accuracy of the model in predicting completely unseen data. This metric is typically seen as the litmus test to ensure a model’s predictions are reliable.

Classification: Support Vector Machines Building model Testing accuracy on H1 dataset on H2 dataset

Classification: Logistic Regression vs. Support Vector Machines Metric Logistic Regression Support Vector Machines 0 0.68 0.68 1 0.72 0.77 macro avg 0.70 0.73 weighted avg 0.70 0.73

Did a neural network do any better? • Only slight increase in accuracy – and the neural network used 500 epochs to train the model! AUC for SVM = 0.743 AUC for Neural Network = 0.755

More complex models are not always the best • As we have seen, training a neural network only resulted in a very slight increase in AUC. • This must be weighed against the additional time and resources needed to train the model – squeezing out an extra couple of points in accuracy is not always viable.

Two time series – what is the difference? H1 H2

Findings H1 H2 ARIMA performed better LSTM performed better

ARIMA Major tool used in time series analysis to attempt to forecast future values of a variable based on its present value. • p = number of autoregressive terms • d = differences to make series stationary • q = moving average terms (or lags of the forecast errors)

LSTM (Long-Short Term Memory Network) • Traditional neural networks are not particularly suitable for time series analysis. • This is because neural networks do not account for the seque quentia ntial (or step-wise) nature of time series. • In this regard, a long-short term memory network (or LSTM model) must be used in order to examine long-term dependencies across the data. • LSTMs are a type of recur urren ent neural network and work particularly well with volatile data.

Constructing an LSTM model Choosing the time Scaling data Configure neural parameter appropriately network • In this case, the • MinMaxScaler • Loss = Mean cancellation used to scale Squared Error value at time t is data between 0 • Optimizer = adam being predicted and 1 • Trained across 20 by the previous epochs – further five values iterations proved redundant

LSTM Results for H2 Dataset

“No Free Lunch” Theorem Another model needed for problem B This model solves problem A

Model Selection Considerations Run a subset Identify the Run the full of the data best- dataset on across many performing this model models model

Data Architecture • Designing a machine learning model is only one component of an ML project. • Under what environment will the model be run? Cloud? Locally? • What are the relative advantages and disadvantages of each?

Amazon SageMaker: Some Advantages Ability to modify Easier to coordinate computing resources Python versions as needed to run across users models Running and No need for upfront maintaining a data investment center becomes unnecessary

Sample workflow on Amazon SageMaker Add repository from Create notebook instance Select instance type, e.g. GitHub or AWS and generate ML solution t2.medium, t2.large… CodeCommit in the cloud

Add repository from GitHub or AWS CodeCommit

Select instance type, e.g. t2.medium, t2.large

Create notebook instance and generate ML solution in the cloud

Summary of Findings • AUC for Support Vector Machine = 0.74 (or 74% classification accuracy) Metric ARIMA LSTM MDA 0.86 0.8 H1 RMSE 57.95 31.98 MFE -12.72 -22.05 Metric ARIMA LSTM MDA 0.86 0.8 H2 RMSE 274.07 74.80 MFE 156.32 28.52

Conclusion Data Manipulation is an “No free lunch” – make integral part of an ML sure the model is project appropriate to the data Pay attention to the workflow(s) being used and the relative advantages and disadvantages of each

Predicting Hotel Cancellations with Machine Learning Michael el - PowerPoint PPT Presentation

Predicting Hotel Cancellations with Machine Learning Michael el Grogan Machine Learning Consultant @ MGCodesandStats michael-grogan.com Big Data Conference Europe 2019 - join at Slido.com with #bigdata2019 Why are hotel cancellations a

Tivoli Hotel (936 Warren Avenue) Special Use Tivoli Hotel 936 Warren Ave Tivoli Hotel

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

THE FERN - AN ECOTEL HOTEL, JAIPUR Hotel opening year- 2008 Number of Rooms- 85 Hotel Board line

Sofitel Abidjan Hotel Ivoire SOFITEL ABIDJAN HOTEL IVOIRE The Sofitel Abidjan Hotel Ivoire is the

Style at Hotel Grand Chancellor Adelaide on Hindley Elegance at Hotel Grand Chancellor Adelaide

HOTEL PRESENTATION HOTEL LOCATION Novotel Bangkok Platinum Pratunam is a 288-room, international

Anemi Hotel Apartments Introduction Anemi Hotel Apartments is a superior hotel located in central

ANNUAL REPORT 2019 HOTEL PROPERTY INVESTMENTS 1 ANNUAL REPORT 2019 | Hotel Property Investments

DARWIN LUXURY HOTEL OUR VISION Our shared vision We are creating an iconic tropical hotel. A

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

BE BETTER AT EMAIL Heather Link-Bergman , Marketing and Communications Strategist, Prevention

Ag T ruc king The Good, The Bad, and The Ugly I ssue s F a c ing the I ndustry F4A

Internal Combustion Engines Why Are We Here Today? Ski Resorts & Review definitions

Third Quarter 2019 Update Third Quarter 2019 Update Third Quarter 2019 Update Third Quarter 2019

Mobilize Your Board to Raise More Money Presented by Andy Robinson For the Texas Land

Link Spokane: Integrating Transportation & Utility Infrastructure Planning Introductions

Chabot - Las Positas Chabot - Las Positas Community College District Community College District

The Lean Turnaround ILC Webinar Week Nick Novotny January 17 th , 2020 29 January, 2020

Sambuz

Useful Links

Newsletter

Mail Us

Predicting Hotel Cancellations with Machine Learning Michael el - PowerPoint PPT Presentation

Predicting Hotel Cancellations with Machine Learning Michael el Grogan Machine Learning Consultant @ MGCodesandStats michael-grogan.com Big Data Conference Europe 2019 - join at Slido.com with #bigdata2019 Why are hotel cancellations a

Tivoli Hotel (936 Warren Avenue) Special Use Tivoli Hotel 936 Warren Ave Tivoli Hotel

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

THE FERN - AN ECOTEL HOTEL, JAIPUR Hotel opening year- 2008 Number of Rooms- 85 Hotel Board line

Sofitel Abidjan Hotel Ivoire SOFITEL ABIDJAN HOTEL IVOIRE The Sofitel Abidjan Hotel Ivoire is the

Style at Hotel Grand Chancellor Adelaide on Hindley Elegance at Hotel Grand Chancellor Adelaide

HOTEL PRESENTATION HOTEL LOCATION Novotel Bangkok Platinum Pratunam is a 288-room, international

Anemi Hotel Apartments Introduction Anemi Hotel Apartments is a superior hotel located in central

ANNUAL REPORT 2019 HOTEL PROPERTY INVESTMENTS 1 ANNUAL REPORT 2019 | Hotel Property Investments

DARWIN LUXURY HOTEL OUR VISION Our shared vision We are creating an iconic tropical hotel. A

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

BE BETTER AT EMAIL Heather Link-Bergman , Marketing and Communications Strategist, Prevention

Ag T ruc king The Good, The Bad, and The Ugly I ssue s F a c ing the I ndustry F4A

Internal Combustion Engines Why Are We Here Today? Ski Resorts &amp; Review definitions

Third Quarter 2019 Update Third Quarter 2019 Update Third Quarter 2019 Update Third Quarter 2019

Mobilize Your Board to Raise More Money Presented by Andy Robinson For the Texas Land

Link Spokane: Integrating Transportation &amp; Utility Infrastructure Planning Introductions

Chabot - Las Positas Chabot - Las Positas Community College District Community College District

The Lean Turnaround ILC Webinar Week Nick Novotny January 17 th , 2020 29 January, 2020

Sambuz

Useful Links

Newsletter

Mail Us

Internal Combustion Engines Why Are We Here Today? Ski Resorts & Review definitions

Link Spokane: Integrating Transportation & Utility Infrastructure Planning Introductions