vci 2 r at the ntcir 13 lifelog 2 lsat task
play

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu - PowerPoint PPT Presentation

VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR,


  1. VCI 2 R at the NTCIR-13 Lifelog-2 LSAT Task Presented by: Qianli Xu Co-authors: Jie Lin, Ana del Molino, Qianli Xu, Fen Fang, V. Subbaraju, Joo-Hwee Lim, Liyuan Li, V. Chandrasekhar Organization: Institute for Infocomm Research, A*STAR, Singapore

  2. About VCI 2 R • Institute for Infocomm Research (I 2 R), A*STAR, Singapore – Visual Computing – Human Language Tech – Data Analytics – Neural Biomedical Tech – etc. Visual Computing Department • – Video/image analytics & search – Augmented visual intelligence – Visual inspection Website: www.a-star.edu.sg/i2r/

  3. LSAT Framework Query Topics Image + Semantic Gap Metadata “Castle @ Night” “ Working in a coffee shop ” “Gardening in my home” Relevant concepts : What are the • Training Images Offline CNN predications relevant to query Feature weight Relevant concepts topics? Object w 1 Classifier Feature weighting : Which features • CNN Places w 2 Classifier contribute the most? Object Faster RCNN w 3 Detector Query Temporal NTCIR-13 Lifelog Temporal smoothing : Temporal w 4 • Topic Smoothing Classifier Images coherence, remove outliers w 5 Time tag User-given … … w 6 Loc tag Post filtering : refine search using • location (GPS) and Time w 7 # People Online del Molino, et al., 2017, VC-I2R at ImageCLEF2017: Ensemble of deep learned features for lifelog video summarization. CLEF Working Notes , CEUR .

  4. 1. Getting the Basic Semantics • CNN classifiers – Object: ResNet152 – ImageNet1K – Place: ResNet152 – Place365 • CNN detector – Faster R-CNN – MSCOCO (80) • NTCIR-13 classifier – VGG-16 – ImageNet1K – Replace the last layer (1K neurons) with 634 neurons – Sigmoid as the activation function • Human detection and counting – Sighthound (https://www.sighthound.com)

  5. 2. Aggregating & Weighing Features Relevance mapping for each topic Training Images Objects Places MSCOCO Feature weight Relevant concepts Task Relevant Avoid Relevant Avoid Relevant 1 computer - computer - laptop w1 group meeting group meeting keyboard ImageNet1K etc. w2 2 television computer living room conference room tv food group meeting television room lecture room remote glass etc. etc. etc. w3 Places365 3 computer o ffi ce co ff ee shop conference room laptop group meeting living room o ffi ce keyboard w4 etc. etc. 4 computer o ffi ce living room conference room laptop w5 pencil hotel room o ffi ce book MSCOCO notebook etc. etc. etc. w6 5 food drum food court - fork glass white goods restaurant sandwich w7 menu’ etc. etc. NTCIR w8 w9 CRF for Feature weighing that Time w10 accommodates individual differences w11 # People X X E θ ( s ) = λ φ u ( s i ) + φ p ( s i , s j ) , | {z } w12 Location tag | {z } i ij unary pairwise the unary potentials enforce the selection of static

  6. 4. Post-filtering 3. Temporal Smoothing • Adjacent lifelog images may • Increase diversity of retrieved share similar event. images (avoid retrieving images of the same event) • Temporal smoothing is used to ensure the semantic • Use time and location (GPS) to coherence. filter images • A triangular window of size • Exclude images that are closer w is used. w is adaptive to in time and location. event topics.

  7. Result • Official score (precision): 57.6% 1 User 1 User 2 0.8 0.6 mAP 0.4 0.2 0 Eat Lunch Coffee Graveyard Working Late Juice Work w Coffee Painting Walls Eating Pasta Exercises Turtles Gardening Castle at Night Sunset Lecturing Shopping On Computer Cooking Flying Photo of Sea Beers in Bar Greek Amphit TV Recording Mountain Hiking

  8. Analysis (Fine-tuning) 0.9 0.826 0.9 0.9 0.789 0.761 0.748 User 2 0.761 0.8 0.8 User 2 0.654 0.8 0.7 0.7 User 1 0.528 0.543 0.6 0.502 0.528 0.6 0.7 mAP 0.5 User 1 mAP User 1 0.5 mAP 0.4 User 2 0.4 0.6 0.3 0.3 0.2 0.2 0.5 0.1 0.1 0 Fixed Adaptive Adaptive 0.4 0 All − NTCIR − 13 − ImageNet1K − Places365 − MSCOCO − Location − Time − #People (User) (User + No smoothing Temporal Event) smoothing Feature importance Effect of threshold for Effect of temporal relevant concept searching smoothing Decrease in Semantic concepts which Whether temporal smoothing performance when we activation level is above the is performed or not remove one type of threshold is considered relevant feature. The bigger the to the query topic decrease, the more important the feature.

  9. Summary • A lot of fine-tuning and manual intervention are Reasonable involved in the retrieval à Ground Truth Intelligence in Good Over-fitting? Interpretation Semantic of Query Features • “Relevant” concepts may not Topics be contributing, and vice verse . Effective Intelligence in Lifelog High Quality Model Fine- • Interactive retrieval is Data Image tuning Retrieval probably a good intermediate solution. LIT Email: qxu@i2r.a-star.edu.sg

Recommend


More recommend