textonboost textonboost
play

TextonBoost : : TextonBoost Joint Appearance, Shape and Context - PowerPoint PPT Presentation

TextonBoost : : TextonBoost Joint Appearance, Shape and Context Joint Appearance, Shape and Context Modeling for Multi- -Class Object Class Object Modeling for Multi Recognition and Segmentation Recognition and Segmentation * , J. Winn


  1. TextonBoost : : TextonBoost Joint Appearance, Shape and Context Joint Appearance, Shape and Context Modeling for Multi- -Class Object Class Object Modeling for Multi Recognition and Segmentation Recognition and Segmentation * , J. Winn † , C. Rother † , † J. Shotton * , J. Winn † , C. Rother † , and and A. Criminisi A. Criminisi † J. Shotton * University of Cambridge * University of Cambridge † Microsoft Research Ltd, Cambridge, UK † Microsoft Research Ltd, Cambridge, UK

  2. Introduction Introduction � Simultaneous recognition and Simultaneous recognition and � segmentation segmentation � Explain every pixel (dense features) Explain every pixel (dense features) � � Appearance + shape + context Appearance + shape + context � � Exploit class generalities + image Exploit class generalities + image � specifics specifics � Contributions Contributions � � New low New low- -level features level features � � New texture New texture- -based discriminative model based discriminative model � � Efficiency and scalability Efficiency and scalability Example Results �

  3. Structure of Presentation Structure of Presentation � The MSRC 21 The MSRC 21- -Class Object Recognition Class Object Recognition � Database Database � New New ‘ ‘Shape Filter Shape Filter’ ’ Features Features � � Randomised boosting with Shared Features Randomised boosting with Shared Features � � Adapting to the Pascal VOC Challenge Adapting to the Pascal VOC Challenge �

  4. Image Databases Image Databases MSRC 21- -Class Object Recognition Database Class Object Recognition Database MSRC 21 � � � 591 hand 591 hand- -labelled images ( 45% train, 10% validation, 45% test ) labelled images ( 45% train, 10% validation, 45% test ) � Corel ( 7 Corel ( 7- -class ) and class ) and Sowerby Sowerby ( 7 ( 7- -class ) class ) [He et al. [He et al. CVPR CVPR � � 04] 04]

  5. Sparse vs vs Dense Features Dense Features Sparse � Successes using sparse features, e.g. Successes using sparse features, e.g. � [Fergus et al. et al. ICCV 2005], [ ICCV 2005], [Leibe Leibe et al. et al. CVPR CVPR ICCV 2005], [Fergus [Sivic Sivic et al. et al. ICCV 2005], [ 2005] 2005] � But But… … � � do not explain whole image do not explain whole image � � cannot cope well with all object classes cannot cope well with all object classes � � We use We use dense dense features features � � ‘ ‘shape filters shape filters’ ’ � � local texture local texture- -based image descriptions based image descriptions � � Cope with Cope with � problem images for sparse features? � textured and textured and untextured untextured objects, occlusions, objects, occlusions, � whilst retaining high efficiency whilst retaining high efficiency

  6. Textons Textons � Shape filters use Shape filters use texton texton maps maps � [Varma Varma & Zisserman IJCV & Zisserman IJCV [ 05] 05] [Leung & Malik IJCV 01] [Leung & Malik IJCV 01] � Compact and efficient characterisation of local Compact and efficient characterisation of local � texture texture � Clustering Texton map Input image Colours � Texton Indices Filter Bank

  7. Shape Filters Shape Filters , ( ) � Pair: Pair: � v ( ( i i 1 , r r , , t t ) = ) = a a 1 , v rectangle r texton t ( i , r , t ) = 0 v ( i 2 r , t ) = 0 v 2 , � Feature responses Feature responses v v ( ( i i , , r r , , t t ) ) ( i , r , t ) = a/2 v ( i 3 3 , r , t ) = a/2 v � � Integral images Integral images � appearance context

  8. Shape and Appearance Shape and Appearance , ) ( t 0 t (r 1 , t t 1 ) = (r 1 , 1 ) = 0 t 1 t 2 t t 1 2 ( , ) t 3 t 4 t t 3 4 (r 2 , t t 2 ) = (r 2 , 2 ) = texton map ground truth texton map feature response image feature response image v ( i, r 1 , t 1 ) v ( i, r 2 , t 2 )

  9. Shape and Appearance Shape and Appearance , ) ( t 0 t (r 1 , t t 1 ) = (r 1 , 1 ) = 0 t 1 t 2 t t 1 2 ( , ) t 3 t 4 t t 3 4 (r 2 , t t 2 ) = (r 2 , 2 ) = texton map ground truth texton map texton map summed response images summed response images v ( i, r 1 , t 1 ) + v(i, r 2 , t 2 ) v ( i, r 1 , t 1 ) + v(i, r 2 , t 2 )

  10. Shape- -Texture Potentials Texture Potentials Shape � Joint Boost algorithm Joint Boost algorithm [Torralba Torralba et al. et al. CVPR 2004] CVPR 2004] � [ � iteratively combines many shape filters iteratively combines many shape filters � � builds multi builds multi- -class logistic classifier class logistic classifier � � Resulting combination exploits: Resulting combination exploits: � Shape Texture Context (!) � Shape Shape- -Texture potentials: Texture potentials: � shape-texture potentials logistic classifier

  11. Feature Selection by Boosting Feature Selection by Boosting 30 rounds 1000 rounds 2000 rounds input image inferred segmentation confidence colour = most likely label white = high entropy black = low entropy

  12. Feature Selection by Boosting Feature Selection by Boosting 30 rounds 1000 rounds 2000 rounds input image inferred segmentation confidence colour = most likely label white = high entropy black = low entropy

  13. Randomised Boosting Randomised Boosting � Avoid expensive search over all features Avoid expensive search over all features � � only check random fraction (e.g. 0.3%) at each round only check random fraction (e.g. 0.3%) at each round � � over several thousand rounds probably try all possible over several thousand rounds probably try all possible � features features non-randomised boosting non-randomised boosting randomised boosting randomised boosting

  14. Accurate Segmentation? Accurate Segmentation? � Shape Shape- -texture potentials alone texture potentials alone � � effectively recognise objects effectively recognise objects � � but not sufficient for pixel but not sufficient for pixel- -perfect perfect � shape-texture segmentation segmentation � Conditional Random Field Conditional Random Field � (CRF) – – (CRF) + CRF see oral presentation tomorrow! see oral presentation tomorrow!

  15. Adapting TextonBoost TextonBoost to the to the Adapting Pascal VOC Challenge Pascal VOC Challenge

  16. Training Training � Pascal training data is bounding boxes. Pascal training data is bounding boxes. � � Need Need pixelwise pixelwise labelling labelling – – use use GrabCut GrabCut based on based on � bounding box (noisy labelling!): bounding box (noisy labelling!): Add ‘ ‘background background’ ’ label for non label for non- -object regions object regions � Add � and train background class. and train background class. ~1 day training time (for 10 classifiers on 1/3 � ~1 day training time (for 10 classifiers on 1/3 � data) d t )

  17. Results Results

  18. Classification (competition 1) Classification (competition 1) � To give uncertainty measure, use only boosted To give uncertainty measure, use only boosted � softmax classifier and normalised sum of classifier and normalised sum of softmax classifier over all image pixels. classifier over all image pixels. Area under curve (AUC) bicycle bus car cat cow dog horse motorbike person sheep 0.873 0.86 0.88 0.822 0.85 0.76 0.75 0.844 0.715 0.86 4 7 0 8 4 6 � Test time: 30sec image (three seconds per Test time: 30sec image (three seconds per � classifier) classifier) VOC experiments by Jamie Shotton VOC experiments by Jamie Shotton

  19. Detection (competition 3) Detection (competition 3) � Work in progress: Work in progress: scale/viewpoint invariant scale/viewpoint invariant � Layout Consisent Consisent Random Field Random Field Layout Input image T 3 T 1 T 1 T 2 T 2 Layout-consistent regions Instance labelling

  20. Detection (competition 3) Detection (competition 3) � Work in progress: Work in progress: scale/viewpoint invariant scale/viewpoint invariant � Layout Consisent Consisent Random Field Random Field Layout � Instead, used connected Instead, used connected- -components of most components of most � probable labelling (ignoring if <1000 pixels) and probable labelling (ignoring if <1000 pixels) and then computed normalised sum (as before) then computed normalised sum (as before) Average precision (AP) bicycle bus car cat cow dog horse motorbike person sheep 0.249 0.13 0.25 0.151 0.14 0.11 0.09 0.178 0.030 0.13 8 4 9 8 1 1

  21. Suggestions for Pascal VOC 2007 Suggestions for Pascal VOC 2007 � Include other types of object classes: Include other types of object classes: � � unstructured classes (e.g. sky, grass) unstructured classes (e.g. sky, grass) � � semi semi- -structured classes (e.g. building). structured classes (e.g. building). � � Have small number of pixel Have small number of pixel- -wise labelled wise labelled � images and include a segmentation images and include a segmentation competition. competition. � Keep it hard!!! Keep it hard!!! �

  22. Thank you Thank you TextonBoost code will be available shortly from code will be available shortly from TextonBoost http://mi.eng.cam.ac.uk/~jdjs2/

Recommend


More recommend