1 Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model Ashutosh Singla, Lin Yuan , and Touradj Ebrahimi lin.yuan@epfl.ch Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Outline 2 o Introduction o Image Dataset o Experiments and Analysis o Demonstration o Conclusion Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Introduction 3 o Dietary assessment based on multimedia techniques, e.g., image analysis o Initial and crucial steps: – Detect food images from daily images – Identify food item in a food image o Food categorization – Recognizing food in major categories may help in approx. estimation of nutritional value Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Introduction 4 o Objectives of the work – Food/non-food image classification – Food categorization (pre-defined 11 classes.) o Convolutional Neural Networks (CNN) and pre-trained GoogLeNet model Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Dataset 5 o Food-5K – 2.5K food and 2.5K non-food – Image source: § Wearable camera § Mobile phone § Existing datasets: – Food-101 – UEC-FOOD-100 – UEC-FOOD-256 – High variety Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Dataset 6 o Food-11 – 11 major food categories – Image source: § Social media, e.g., Instagram, Flickr § Existing datasets: – Food-101 – Multiple types of food in each category Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Dataset 7 o Food-11 Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Dataset 8 o Food-11 Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Experiments 9 o Food/Non-food Classification – Fine tuning on the last 6 layers of a pre-trained GoogLeNet model, on Food-5K – 3K training, 1K validation and 1K testing – Max. acc. of 99.2% Predicted classes Food Non-food Food 99.4% 0.6% Actual classes Non-food 1.0% 99.0% Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Experiments 10 o Food Category Recognition – Fine-tune on Food-11 Predicted classes Vegetable/Fruit Dairy products Noodles/Pasta Fried food – GoogLeNet: last 6 layers Seafood Dessert Bread Meat Soup Rice Egg – Best results: Bread 67.7 3.8 10.9 4.6 6.5 1.9 0.3 0.0 0.3 4.1 0.0 0 . 9 Dairy products 0.0 87.2 9.5 0.7 0.7 0.7 0.0 0.7 0.0 0.7 0.0 § Overall Acc. 83.5% 0 . 8 Dessert 1.6 6.0 81.4 0.8 0.8 2.0 0.4 0.0 2.4 4.6 0.0 0 . 7 § F-measure 0.911 Egg 4.8 2.4 6.9 77.3 2.4 0.3 0.0 1.5 0.6 3.6 0.3 Actual classes 0 . 6 Fried food 1.7 1.7 5.2 0.7 81.9 3.1 0.0 0.7 1.4 3.5 0.0 § Kappa 0.816 0 . 5 Meat 3.7 0.2 5.3 0.9 3.0 79.6 0.0 0.2 2.1 4.9 0.0 Noodles/Pasta 0.0 0.7 0.0 0.0 0.7 0.0 95.9 0.0 0.7 2.0 0.0 0 . 4 Rice 0.0 0.0 2.1 0.0 0.0 0.0 0.0 95.8 2.1 0.0 0.0 0 . 3 Seafood 1.7 1.3 6.9 0.7 0.0 1.0 0.0 0.3 83.8 4.3 0.0 0 . 2 Soup 0.2 0.6 0.4 0.2 0.0 0.0 0.0 0.2 0.2 98.0 0.2 0 . 1 Vegetable/Fruit 0.0 2.2 5.2 0.4 0.4 1.3 0.9 0.4 3.0 0.4 85.7 0 . 0 Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Experiments 11 o Food Category Recognition – Top 10 misclassified pairs – Reasons: § Images within different classes have similar appearance, shape or color. § Images have more than one type of food items mixed. Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Demonstration 12 o NutriTake Android App Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Conclusion 13 o Two datasets – Food-5K: 5,000 food/non-food images – Fodd-11: 11 food categories Link to dataset/App: o Pre-trained GoogLeNet model for – Food/non-food classification § Max. accuracy of 99.2% – Food categorization § Max. accuracy of 83.5% o A. Singla, L. Yuan, and T. Ebrahimi. Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model. In Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management (MADiMa '16). Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Questions 14 Multimedia Signal Processing Group Wearable, October 13, 2016 EPFL, Lausanne, Switzerland Ecole Polytechnique Fédérale de Lausanne
Recommend
More recommend