VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu - PowerPoint PPT Presentation

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR, Singapore

About VCI 2 R • Institute for Infocomm Research (I 2 R), A*STAR, Singapore – Visual Computing – Human Language Tech – Data Analytics – Neural Biomedical Tech – etc. Visual Computing Department • – Video/image analytics & search – Augmented visual intelligence – Visual inspection Website: www.a-star.edu.sg/i2r/

LSAT Framework Query Topics Image + Semantic Gap Metadata “Castle @ Night” “ Working in a coffee shop ” “Gardening in my home” Relevant concepts : What are the • Training Images Offline CNN predications relevant to query Feature weight Relevant concepts topics? Object w 1 Classifier Feature weighting : Which features • CNN Places w 2 Classifier contribute the most? Object Faster RCNN w 3 Detector Query Temporal NTCIR-13 Lifelog Temporal smoothing : Temporal w 4 • Topic Smoothing Classifier Images coherence, remove outliers w 5 Time tag User-given … … w 6 Loc tag Post filtering : refine search using • location (GPS) and Time w 7 # People Online del Molino, et al., 2017, VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. CLEF Working Notes , CEUR .

1. Getting the Basic Semantics • CNN classifiers – Object: ResNet152 – ImageNet1K – Place: ResNet152 – Place365 • CNN detector – Faster R-CNN – MSCOCO (80) • NTCIR-13 classifier – VGG-16 – ImageNet1K – Replace the last layer (1K neurons) with 634 neurons – Sigmoid as the activation function • Human detection and counting – Sighthound (https://www.sighthound.com)

2. Aggregating & Weighing Features Relevance mapping for each topic Training Images Objects Places MSCOCO Feature weight Relevant concepts Task Relevant Avoid Relevant Avoid Relevant 1 computer - computer - laptop w1 group meeting group meeting keyboard ImageNet1K etc. w2 2 television computer living room conference room tv food group meeting television room lecture room remote glass etc. etc. etc. w3 Places365 3 computer o ffi ce co ff ee shop conference room laptop group meeting living room o ffi ce keyboard w4 etc. etc. 4 computer o ffi ce living room conference room laptop w5 pencil hotel room o ffi ce book MSCOCO notebook etc. etc. etc. w6 5 food drum food court - fork glass white goods restaurant sandwich w7 menu’ etc. etc. NTCIR w8 w9 CRF for Feature weighing that Time w10 accommodates individual differences w11 # People X X E θ ( s ) = λ φ u ( s i ) + φ p ( s i , s j ) , | {z } w12 Location tag | {z } i ij unary pairwise the unary potentials enforce the selection of static

4. Post-filtering 3. Temporal Smoothing • Adjacent lifelog images may • Increase diversity of retrieved share similar event. images (avoid retrieving images of the same event) • Temporal smoothing is used to ensure the semantic • Use time and location (GPS) to coherence. filter images • A triangular window of size • Exclude images that are closer w is used. w is adaptive to in time and location. event topics.

Result • Official score (precision): 57.6% 1 User 1 User 2 0.8 0.6 mAP 0.4 0.2 0 Eat Lunch Coffee Graveyard Working Late Juice Work w Coffee Painting Walls Eating Pasta Exercises Turtles Gardening Castle at Night Sunset Lecturing Shopping On Computer Cooking Flying Photo of Sea Beers in Bar Greek Amphit TV Recording Mountain Hiking

Analysis (Fine-tuning) 0.9 0.826 0.9 0.9 0.789 0.761 0.748 User 2 0.761 0.8 0.8 User 2 0.654 0.8 0.7 0.7 User 1 0.528 0.543 0.6 0.502 0.528 0.6 0.7 mAP 0.5 User 1 mAP User 1 0.5 mAP 0.4 User 2 0.4 0.6 0.3 0.3 0.2 0.2 0.5 0.1 0.1 0 Fixed Adaptive Adaptive 0.4 0 All − NTCIR − 13 − ImageNet1K − Places365 − MSCOCO − Location − Time − #People (User) (User + No smoothing Temporal Event) smoothing Feature importance Effect of threshold for Effect of temporal relevant concept searching smoothing Decrease in Semantic concepts which Whether temporal smoothing performance when we activation level is above the is performed or not remove one type of threshold is considered relevant feature. The bigger the to the query topic decrease, the more important the feature.

Summary • A lot of fine-tuning and manual intervention are Reasonable involved in the retrieval à Ground Truth Intelligence in Good Over-fitting? Interpretation Semantic of Query Features • “Relevant” concepts may not Topics be contributing, and vice verse . Effective Intelligence in Lifelog High Quality Model Fine- • Interactive retrieval is Data Image tuning Retrieval probably a good intermediate solution. LIT Email: qxu@i2r.a-star.edu.sg

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu - PowerPoint PPT Presentation

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR,

THUIR at the NTCIR-14 Lifelog-3 (LIT Task): How does lifelog help the users status recognition

VCI 2 R at the NTCIR-13 Lifelog-2 LIT Task Presented by: Qianli Xu Co-authors: Qianli Xu, V.

HCMUS at the NTCIR-14 Lifelog-3 Task Nguyen-Khang Le, Dieu-Hien Nguyen, Trung-Hieu Hoang,

Smart Lifelog Retrieval System with Habit-based Concepts and Moment Visualization QUIK team

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

CUTKB at NTCIR-14 QALab-PoliInfo Task Toshiki Tomihira and Yohei Seki University of Tsukuba,

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

BridgingTechnologyandPsychology throughtheLifelog Personality, Mood and Sleep Quality

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation

SSTUT at NTCIR-4 Web task Yinghui Xu Kyoji Umemura Software System Lab. (Umemura Lab)

TUA1 at the NTCIR-14 STC-3 Task Chinese Emotional Conversation Generation Subtask Tokushima

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora & Gareth J.F. Jones ADAPT Centre, School of

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral

Forst: Question Answering System for Term and Essay Questions at NTCIR-13 QA Lab-3 Task Kotaro

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

LSAT Information Session University of San Diego Test Preparation Course Prepare for an

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu - PowerPoint PPT Presentation

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR,

THUIR at the NTCIR-14 Lifelog-3 (LIT Task): How does lifelog help the users status recognition

VCI 2 R at the NTCIR-13 Lifelog-2 LIT Task Presented by: Qianli Xu Co-authors: Qianli Xu, V.

HCMUS at the NTCIR-14 Lifelog-3 Task Nguyen-Khang Le, Dieu-Hien Nguyen, Trung-Hieu Hoang,

Smart Lifelog Retrieval System with Habit-based Concepts and Moment Visualization QUIK team

KSU Teams QA System for World History Exams at the NTCIR-13 QA Lab-3 Task Tasuku Kimura, Ryo

CUTKB at NTCIR-14 QALab-PoliInfo Task Toshiki Tomihira and Yohei Seki University of Tsukuba,

Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

BridgingTechnologyandPsychology throughtheLifelog Personality, Mood and Sleep Quality

SG01 at the NTCIR-13 STC-2 task Haizhou Zhao , Yi Du, Hangyu Li, Qiao Qian, Hao Zhou, Minlie

MPII at the NTCIR-14 WWW-2 Task Andrew Yates Max Planck Institute for Informatics Motivation

SSTUT at NTCIR-4 Web task Yinghui Xu Kyoji Umemura Software System Lab. (Umemura Lab)

TUA1 at the NTCIR-14 STC-3 Task Chinese Emotional Conversation Generation Subtask Tokushima

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora &amp; Gareth J.F. Jones ADAPT Centre, School of

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral

Forst: Question Answering System for Term and Essay Questions at NTCIR-13 QA Lab-3 Task Kotaro

DCU at the NTCIR-11 SpokenQuery&amp;Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for

Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task

LSAT Information Session University of San Diego Test Preparation Course Prepare for an

MPII at the NTCIR-14 CENTRE Task Andrew Yates Max Planck Institute for Informatics Motivation

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

Overview of Patent Retrieval Task at NTCIR-4 Atsushi Fujii (Univ. of Tsukuba) Makoto Iwayama

SLWWW at the NTCIR-13 WWW Task Peng XIAO , Yimeng FAN , Lingtao Li, Tetsuya Sakai Waseda

I t Introduction to NTCIR-7 d ti t NTCIR 7 N Noriko Kando k K d National Institute of

DCU at the NTCIR-14 OpenLiveQ-2 Task Piyush Arora & Gareth J.F. Jones ADAPT Centre, School of

DCU at the NTCIR-11 SpokenQuery&Doc Task David N. Racca, Gareth J.F. Jones CNGL Centre for