0,00 9,80 16,10 16,10 7,40 6,94 Using Deep Learning to rank and tag 6,30 5,90 millions of hotel images 15/11/2018 - PyParis 2018 0,00 Christopher Lennan (Senior Data Scientist) @chris_lennan Tanuj Jain (Data Scientist) @tjainn #idealoTech 8,10 1 0,32 0,32
0,00 9,80 16,10 16,10 Agenda 7,40 6,94 idealo.de 1. 6,30 5,90 Business Motivation 2. Models and Training 3. Image Tagging 4. Image Aesthetics 5. 0,00 Summary 6. 8,10 2 0,32 0,32
0,00 9,80 16,10 16,10 Some Key Facts 7,40 6,94 6,30 5,90 18 More than 18 years 16 million users/month 1 experience Germany's 4th largest 50.000 shops eCommerce website 0,00 Active in 6 different countries Over 330 million offers for 2 (DE, AT, ES, IT, FR, UK) million products 700 “idealos” from 40 Tüv certified comparison portal 2 nations 8,10 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Motivation 0,00 8,10 4 0,32 0,32
0,00 9,80 16,10 16,10 idealo hotel price comparison 7,40 hotel.idealo.de 6,94 6,30 5,90 2.306.658 accommodations ● 308.519.299 images ● 0,00 ~ 133 images per ● accommodation 8,10 5 0,32 0,32
0,00 9,80 16,10 16,10 Importance of Photography for Hotels 7,40 6,94 6,30 5,90 “.. after price, photography is the most important factor for travelers and prospects scanning OTA sites..” “.. Photography plays a role of 60% in the decision to book with a particular hotel ..” 0,00 “.. study published today by TripAdvisor , it would seem like photos have the greatest impact driving engagement from travelers r esearching on hotel and B&B pages ..” 8,10 6 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 0,00 8,10 7 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 0,00 8,10 8 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 0,00 8,10 9 0,32 0,32
1 2 3 4 5 6 7 8 9 10 11 12 13
0,00 9,80 16,10 16,10 Image Aesthetics 7,40 6,94 Current image placement 6,30 5,90 Position: 19 Position: 1 0,00 8,10 11 0,32 0,32
0,00 9,80 16,10 16,10 Image Aesthetics 7,40 6,94 Current image placement 6,30 5,90 Position: 17 Position: 3 0,00 8,10 12 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Beautiful images should 0,00 appear earlier in the gallery 8,10 13 0,32 0,32
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Ensure different areas get 0,00 depicted 8,10 16 0,32 0,32
Bathroom Bedroom 1 2 4 3 Restaurant Facade Fitness Studio Kitchen 6 5 7 8
0,00 9,80 16,10 16,10 Understanding Image Content 7,40 Two part problem 6,94 6,30 5,90 Tag the image with the hotel property area 1. Predict aesthetic quality 2. 0,00 8,10 18 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Models & Training 0,00 8,10 19 0,32 0,32
0,00 9,80 16,10 16,10 Transfer Learning 7,40 6,94 6,30 5,90 Use pre-trained CNN that was trained on millions of images ● (e.g. MobileNet or VGG16) Replace top layers so that the output fits with classification task ● 0,00 Train existing and new layer weights ● 8,10 20 0,32 0,32
0,00 9,80 16,10 16,10 Transfer Learning 7,40 6,94 CNN architecture (VGG16) 6,30 5,90 0,00 8,10 21 0,32 0,32
0,00 9,80 16,10 16,10 Training regime 7,40 6,94 6,30 5,90 Only train the newly added dense layers with high learning rate 1. Then train all layers with low learning rate 2. Goal: Do not juggle around the pre-trained convolutional weights too much 0,00 8,10 22 0,32 0,32
0,00 9,80 16,10 16,10 Training regime 7,40 6,94 6,30 5,90 0,00 8,10 23 0,32 0,32
0,00 9,80 16,10 16,10 Loss functions 7,40 6,94 Cross-entropy loss (CEL) 6,30 5,90 CEL generally used for “one-class” ground truth classifications (e.g. image tagging) ● CEL ignores inter-class relationships between score buckets ● 0,00 8,10 24 source: https://ssq.github.io/2017/02/06/Udacity%20MLND%20Notebook/ 0,32 0,32
0,00 9,80 16,10 16,10 Loss functions 7,40 6,94 Earth Mover’s Distance (EMD) 6,30 5,90 For ordered classes, classification settings can outperform regressions ● Training on datasets with intrinsic ordering can benefit from EMD loss objective ● 0,00 8,10 25 0,32 0,32
0,00 9,80 16,10 16,10 GPU training workflow 7,40 6,94 Local AWS 6,30 5,90 build push ECR Custom Setup AMI datasets Docker Dockerfile nvidia-docker image pull image Docker launch Machine 0,00 copy existing model EC2 S3 Train GPU store train outputs launch training container instance with nvidia-docker train script Docker launch copy existing model Machine EC2 Evaluate launch evaluation container GPU with nvidia-docker instance pull image SSH evaluation script 8,10 26 Jupyter notebook 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Image Tagging 0,00 8,10 27 0,32 0,32
0,00 9,80 16,10 16,10 Tagging Problem 7,40 6,94 6,30 5,90 Given an image, tag it as belonging to a single class ● Multiclass classification model with classes: ● Bedroom ○ Bathroom ○ 0,00 Foyer ○ Restaurant ○ Swimming Pool ○ Kitchen ○ View of Exterior (Facade) ○ Reception ○ 8,10 28 0,32 0,32
0,00 9,80 16,10 16,10 Multiple Datasets 7,40 6,94 6,30 5,90 Will go over them one-by-one and see: Dataset properties ● Results ● Issues ● 0,00 8,10 29 0,32 0,32
0,00 9,80 16,10 16,10 Wellness Dataset 7,40 6,94 6,30 5,90 Idealo in-house pre-labelled images ● Mostly pictures of 2 or 3 stars properties ● 0,00 8,10 30 0,32 0,32
0,00 9,80 16,10 16,10 Wellness Dataset 7,40 6,94 6,30 5,90 Balanced: Equal sample count in ● all categories for all sets 0,00 8,10 31 0,32 0,32
0,00 9,80 16,10 16,10 Wellness Dataset: Metrics 7,40 6,94 6,30 5,90 Top-1- accuracy: 86% 0,00 8,10 32 0,32 0,32
0,00 9,80 16,10 16,10 Wellness Dataset: Wrong Predictions 7,40 True Class of these images: BATHROOM, Predicted as: RECEPTION 6,94 6,30 5,90 0,00 Rectangular structure = Reception with high probability → BIAS! 8,10 33 0,32 0,32
0,00 9,80 16,10 16,10 Wellness Dataset: Wrong Predictions 7,40 6,94 True Class of these images: BATHROOM 6,30 5,90 0,00 Wrong true label of images → NOISE in the dataset! 8,10 34 0,32 0,32
0,00 9,80 16,10 16,10 Correcting Bias 7,40 6,94 6,30 5,90 Augmentation operations, same for every class: ● Random cropping ○ Rotation ○ Horizontal flipping ○ Data enrichment : ● 0,00 External data from google images ○ 8,10 35 0,32 0,32
0,00 9,80 16,10 16,10 Augmented Wellness + Google Dataset: Metrics 7,40 6,94 6,30 5,90 Top-1- accuracy: 88% 0,00 8,10 36 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Gotta Clean! 0,00 8,10 37 0,32 0,32
0,00 9,80 16,10 16,10 Cleaning Dataset 7,40 6,94 6,30 5,90 Hand-cleaned each category: ● Deleted pictures that do not belong in its category ○ Removed duplicates (presence of duplicates can give us wrong ○ metrics) 0,00 Added more images from external sources for classes with a small ○ number of images left after cleaning 8,10 38 0,32 0,32
0,00 9,80 16,10 16,10 Cleaned Data: Metrics 7,40 6,94 6,30 5,90 Top-1- accuracy: 91% 0,00 8,10 39 0,32 0,32
0,00 9,80 16,10 16,10 Cleaned Dataset: Results 7,40 6,94 6,30 5,90 Bathroom vs. Reception confusion has almost vanished! ● View_of_exterior vs Pool confusion has reduced ● Foyer performance: ● Most misclassifications of Foyer gets assigned to ○ 0,00 Reception This is human problem as well! ○ 8,10 40 0,32 0,32
0,00 9,80 16,10 16,10 Foyer or Reception? 7,40 6,94 6,30 5,90 0,00 8,10 41 0,32 0,32
0,00 9,80 16,10 16,10 Learnings so far 7,40 6,94 6,30 5,90 The model can only be as good as the data (cleaning) ● Foyer is a hard category to predict ● 0,00 8,10 42 0,32 0,32
0,00 9,80 16,10 16,10 7,40 6,94 6,30 5,90 Understanding Model Decisions 0,00 8,10 43 0,32 0,32
0,00 9,80 16,10 16,10 Understanding Decisions: Class Activation Maps 7,40 6,94 6,30 Use the penultimate Global Average Pooling Layer (GAP) to get class activation map ● 5,90 Highlights discriminative region that lead to a classification ● 0,00 8,10 44 0,32 0,32
0,00 9,80 16,10 16,10 Insights With CAM 7,40 Swimming Pool misclassified as Bathroom 6,94 6,30 5,90 CAM 0,00 8,10 45 0,32 0,32
0,00 9,80 16,10 16,10 Insights With CAM 7,40 Swimming Pool misclassified as Bathroom 6,94 6,30 5,90 CAM 0,00 8,10 46 0,32 0,32
0,00 9,80 16,10 16,10 Insights With CAM 7,40 Swimming Pool misclassified as Bathroom 6,94 6,30 5,90 CAM 0,00 8,10 47 0,32 0,32
0,00 9,80 16,10 16,10 Insights With CAM 7,40 Swimming Pool misclassified as Bathroom 6,94 6,30 5,90 0,00 Using rails to misidentify Pool as Bathroom. 8,10 48 0,32 0,32
0,00 9,80 16,10 16,10 Insights With CAM 7,40 Bathroom correct classification 6,94 6,30 5,90 CAM 0,00 8,10 49 0,32 0,32
Recommend
More recommend