computer vision
play

Computer Vision: Weakly-supervised learning from video and images - PowerPoint PPT Presentation

CSClub Saint Petersburg November 17, 2014 Computer Vision: Weakly-supervised learning from video and images Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Joint work with: Piotr Bojanowski Rmi Lajugie Maxime Oquab


  1. CSClub Saint Petersburg November 17, 2014 Computer Vision: Weakly-supervised learning from video and images Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Joint work with: Piotr Bojanowski – Rémi Lajugie – Maxime Oquab – Francis Bach – Leon Bottou – Jean Ponce – Cordelia Schmid – Josef Sivic

  2. – Advertisement – О компании VisionLabs – команда профессионалов, обладающих значительными знаниями и существенным практическим Контакты : опытом в сфере разработки алгоритмов компьютерного Официальный сайт : http://visionlabs.ru/ зрения и интеллектуальных систем . Контактное лицо: Ханин Александр E-mail: a.khanin@visionlabs.ru Мы создаем и внедряем технологии Тел. : +7 (926) 988-7891 компьютерного зрения, открывая новые возможности для изменения окружающего нас мира к лучшему.

  3. – Advertisement – Команда Направления деятельности  Технология распознавания лиц Система выявления мошенников в банках  Технология распознавания номеров Система учета и автоматизации доступа транспорта  Технологии для безопасного города Александр Алексей Слава Сергей Сергей Иван Алексей Иван Система выявления нарушений и опасных ситуаций Ханин Нехаев Казьмин Лаптев Миляев Кордичев Трусков Черепанов Chief Executive Chief Senior Software Scientific Financial Software Executive Officer Technical advisor CV engineer developer developer advisor Officer Officer Наша команда – симбиоз науки и бизнеса

  4. – Advertisement – Достижения Проекты масштаба государства

  5. – Advertisement – Мы ищем единомышленников Спасибо за внимание! Создание и внедрение интеллектуальных систем Решение интересных практических задач Контакты : Официальный сайт : http://visionlabs.ru/ Работа в дружной амбициозной команде Контактное лицо: Ханин Александр E-mail: a.khanin@visionlabs.ru Тел. : +7 (926) 988-7891

  6. What is Computer Vision?

  7. What is Computer Vision? 7

  8. What is the recent progress? Industry Research 1990s: Automated quality inspection Recognition at the level of a few (controlled lighting, scale,…) toy objects (COIL 20 dataset) Now: Face recognition in social media ImageNet: 14M images, 21K classes 6% Top-5 error rate in 2014 Challenge

  9. Why image and video analysis? Data: ~2.5 Billion new images / month TV-channels recorded since 60’s ~5K image uploads every min. >34K hours of video upload every day ~30M surveillance cameras in US => ~700K video hours/day And even more with future wearable devices

  10. Why looking at people? How many person-pixels are in the video? Movies TV YouTube

  11. Why looking at people? How many person-pixels are in the video? 35% 34% Movies TV 40% YouTube

  12. How many person pixels in our daily life?  Wearable camera data: Microsoft SenseCam dataset

  13. How many person pixels in our daily life?  Wearable camera data: Microsoft SenseCam dataset ~4%

  14. What are the difficulties?  Large variations in appearance: occlusions, non-rigid motion, view- … point changes, clothing… Action Hugging :  Manual collection of training samples is prohibitive: many … action classes, rare occurrence  Action vocabulary is not well-defined … Action Open :

  15. This talk: Brief overview of recent techniques Weakly-supervised learning from video and scripts Weakly-supervised learning with convolutional neural networks

  16. Standard visual recognition pipeline  Collect image/video samples and corresponding class labels GetOutCar AnswerPhone  Design appropriate data representation, with certain HandShake StandUp invariance properties  Design / use existing DriveCar Kiss machine learning methods for learning and classification

  17. Bag-of-Features action recognition space-time patches Extraction of Local features K-means clustering Occurrence histogram (k=4000) of visual words Feature description Non-linear SVM with χ 2 Feature kernel quantization [Laptev, Marszałek , Schmid, Rozenfeld 2008]

  18. Action classification Test episodes from movies “The Graduate”, “It’s a Wonderful Life”, “Indiana Jones and the Last Crusade”

  19. Where to get training data? • Shoot actions in the lab KTH dataset Weizman dataset,… - Limited variability - Unrealistic • Manually annotate existing content HMDB, Olympic Sports, UCF50, UCF101, … - Very time-consuming • Use readily-available video scripts - Scripts are available for 1000’s of hours of movies and TV -series www.dailyscript.com, www.movie-page.com, www.weeklyscript.com - Scripts describe dynamic and static content of videos

  20. As the headwaiter takes them to a table they pass by the piano, and the woman looks at Sam. Sam, with a conscious effort, keeps his eyes on the keyboard as they go past. The headwaiter seats Ilsa... 21

  21. As the headwaiter takes them to a table they pass by the piano, and the woman looks at Sam. Sam, with a conscious effort, keeps his eyes on the keyboard as they go past. The headwaiter seats Ilsa... 22

  22. As the headwaiter takes them to a table they pass by the piano, and the woman looks at Sam. Sam, with a conscious effort, keeps his eyes on the keyboard as they go past. The headwaiter seats Ilsa... 23

  23. As the headwaiter takes them to a table they pass by the piano, and the woman looks at Sam. Sam, with a conscious effort, keeps his eyes on the keyboard as they go past. The headwaiter seats Ilsa... 24

  24. Script-based video annotation • Scripts available for >500 movies (no time synchronization) www.dailyscript.com, www.movie- page.com, www.weeklyscript.com … • Subtitles (with time info.) are available for the most of movies • Can transfer time to scripts by text alignment movie script subtitles … 1172 … 01:20:17,240 --> 01:20:20,437 RICK Why weren't you honest with me? Why weren't you honest with me? Why Why'd you keep your marriage a secret? did you keep your marriage a secret? 01:20:17 1173 Rick sits down with Ilsa. 01:20:20,640 --> 01:20:23,598 01:20:23 lt wasn't my secret, Richard. ILSA Victor wanted it that way. Oh, it wasn't my secret, Richard. Victor wanted it that way. Not even 1174 our closest friends knew about our 01:20:23,800 --> 01:20:26,189 marriage. Not even our closest friends … knew about our marriage. [Laptev, Marszałek , Schmid, Rozenfeld 2008] …

  25. Scripts as weak supervision Challenges: • Imprecise temporal localization • No explicit spatial localization • NLP problems, scripts ≠ training labels “… Will gets out of the Chevrolet. …” vs. Get-out-car “… Erin exits her new truck…” 24:25 Uncertainty 24:51

  26. Previous work Sivic, Everingham, and Zisserman, ''Who are you?'' -- Learning Person Specific Classifiers from Video, In CVPR 2009. Buehler, Everingham, and Zisserman "Learning sign language by watching TV (using weakly aligned subtitles)", In CVPR 2009. …wanted to know about the history of the trees Duchenne, Laptev, Sivic, Bach and Ponce, "Automatic Annotation of Human Actions in Video", In ICCV 2009.

  27. Joint Learning of Actors and Actions [Bojanowski et al. ICCV 2013] Rick? Rick? Walks? Walks? Rick walks up behind Ilsa

  28. Joint Learning of Actors and Actions [Bojanowski et al. ICCV 2013] Rick Walks Rick walks up behind Ilsa

  29. Formulation: Cost function Actor classifier Actor labels Actor image features Rick Ilsa Sam

  30. Formulation: Cost function Weak supervision from scripts: Person p appears at least once in clip N : p = Rick

  31. Formulation: Cost function Weak supervision from scripts: Action a appears at least once in clip N : a = Walk

  32. Formulation: Cost function Weak supervision from scripts: Person p and Action a Person p Action a appears in appears appear in clip N : in clip N : clip N :

  33. Image and video features Face features • Facial features [Everingham’06] • HOG descriptor on normalized face image Action features • Dense Trajectory features in person bounding box [Wang et al.,’11] 34

  34. Results for Person Labelling American beauty (11 character names) Casablanca (17 character names) 35

  35. Results for Person + Action Labelling Casablanca, Walking 36

  36. Finding Actions and Actors in Movies [Bojanowski, Bach, Laptev, Ponce, Sivic, Schmid, 2013]

  37. Action Learning with Ordering Constraints [Bojanowski et al. ECCV 2014] 38

Recommend


More recommend