VALSE webinar 2015 5 27 Feature Selection in Image and Video - PowerPoint PPT Presentation

VALSE webinar ， 2015 年 5 月 27 日 Feature Selection in Image and Video Recognition JianxinWu National Key Laboratory for Novel Software Technology Nanjing University http://lamda.nju.edu.cn

Introduction For image classification, how to represent an image? With strong discriminative power; and, • manageable storage and CPU costs • 2

Bag of words  Dense sample  Extract visual descriptor (e.g. SIFT or CNN) at every sample location, usually PCA to reduce dimensionality  Learning a visual codebook by k-means 3

The VLAD pipeline  𝐿 code words 𝒅 𝑗 ∈ ℝ 𝐸  Pooling 𝒈 𝑗 = 𝒚 − 𝒅 𝑗 𝒚∈𝒅 𝑗  Concatenation [𝒈 1 𝒈 2 ⋯ 𝒈 𝐿 ]  Dimensionality: 𝐸 × 𝐿 Jegou et al. Aggregating local images descriptors into compact codes. TPAMI, 2012 4

Effect of High Dimensionality  Blessing  Fisher Vector: 𝐿 × (2𝐸 + 1)  Super Vector: 𝐿 × 𝐸 + 1  State-of-the-art results in many application domains  Curse  1 million images  8 spatial pyramid regions  𝐿 = 256, 𝐸 = 64 , 4 bytes to store a floating number  1056G bytes! J. Sanchez et al . Image classification with the fisher vector: Theory and practice. IJCV , 2013. 5 X. Zhou et al . Image classification using super-vector coding of local image descriptors. ECCV , 2010.

Solution?  Use fewer example / dimensions?  Reduce accuracy quickly  Feature compression  Introduction soon  Feature selection  This talk 6

To compress? Methods in the literature: feature compression Compress the long feature vectors so that Much fewer bytes to store them • (possibly) faster learning • 7

Product Quantization illustration  For every 8 dimensions Generate a codebook with 256 1. words VQ a 8d vector (32 bytes) into 2. a index (1 byte) On-the-fly decoding  Get stored index 𝑗 1. Expand into 8d 𝒅 𝑗 2. Do not change learning time Jegou et al . Product quantization for nearest neighbor search. TPAMI, 2011. Vedaldi & Zisserman. Sparse kernel approximations for efficient classification and detection. 8 CVPR, 2012.

Thresholding  A simple idea 𝑦 ← −1, 𝑦 < 0 +1, 𝑦 ≥ 0  32 times compression  Working surprisingly well!  But, why? Perronnin et al . Large-scale image retrieval with compressed Fisher vectors. CVPR, 2010. 9

Bilinear projections (BPBC)  FV or VLAD requires rotation  A large matrix times the long vector  Bilinear projection + binary feature  Example: 𝐿𝐸 vector 𝒚 reshape into 𝐿 × 𝐸 matrix 𝑌  Bilinear projection / rotation 𝑈 𝑌𝑆 2 sgn 𝑆 1  𝑆 1 : 𝐿 × 𝐿 , 𝑆 2 : 𝐸 × 𝐸  Smaller storage and faster computation than PQ  But, learning 𝑆 is very time consuming (circulant?) Gong et al . Learning binary codes for high-dimensional data using bilinear projections. CVPR, 2013. 10

The commonality  Linear projection!  New features are linear combinations of multiple dimensions from the original vector  What does this mean?  Assuming strong multicollinearity exists!  Is this true in reality? 11

Collinearity and multicollinearity Examining real data find that: Collinearity almost never exist • Too expensive to examine the existence of • multicollineairty, but we have something to say 12

Collinearity  Existence of strong linear dependencies between two dimensions in the VLAD / FV vector  Pearson’s correlation coefficient 𝑈 𝒚 :𝑘 𝒚 :𝑗 𝑠 = 𝒚 :𝑗 𝒚 :𝑘  𝑠 = ±1 : perfect collinearity  𝑠 = 0 : no linear dependency at all 13

Three types of checks Region 2 8 Spatial regions Word 1 Word 2 … Word K Dim 1 Dim 2 … Dim D Random pair 1. In the same spatial region 2. In same code word / Gaussian component (all regions) 3. 14

 Same Gaussian shows a little stronger correlation  Mostly no correlation at all! 15

From 2 to 𝑜  Multicollinearity – strong linear dependency among > 2 dimensions  Given the missing of collinearity, the chance of multicollinearity is also small  PCA is essential for FV and VLAD  Dimensions in PCA are uncorrelated  Thus, we should choose, not compress! 16

MI based feature selection A simple mutual information based importance sorting algorithm to choose features Computationally very efficient • When ratio changes, no need to repeat • Highly accurate • 17

Yes, to choose!  Choose is better than compress  Given that multicollinearity is missing  Cannot afford expensive feature selection  Features too big to put into memory  Complex algorithms take too long 18

Usefulness measure  Mutual information 𝐽 𝒚, 𝒛 = 𝐼 𝒚 + 𝐼 𝒛 − 𝐼(𝒚, 𝒛)  𝐼 : entropy  𝒚 : one dimension  𝒛 : image label vector  Selection  Sort all MI values, choose the top 𝐸’  Only one pass of data  No addition work if 𝐸’ changes 19

Entropy computation  Too expensive using complex methods  e.g. kernel density estimation  Use discrete quantization  1-bit: 𝑦 ← −1, 𝑦 < 0 +1, 𝑦 ≥ 0  N-bins: uniformly quantize into N bins  1-bit and 2-bins are different  Discrete entropy: 𝐼 = − 𝑘 𝑞 𝑘 log 2 𝑞 𝑘  Larger N, bigger 𝐼 value 20

 Most features are not use  Choose a small subset is not only for speed or scalability, but also for accuracy!  1-bit >> 4/8 bins – keep the threshold at 0 is important! 21

The pipeline Generate a FV / VLAD vector 1. Only keep the chosen 𝐸’ dimensions 2. Further quantize the 𝐸’ dimensions into 𝐸’ bits 3. 32𝐸  Compression ratio is 𝐸′  Store 8 bits in a byte 22

Image Results Much faster in feature dimensionality reduction, learning • Requires almost no extra storage • In general, significantly higher accuracy with same ratio • 23

Features  Use the Fisher Vector  D=64  128 dim SIFT, reduced by PCA  K=256  Use mean and variance part  8 spatial regions  Total dimensionality: 256 × 64 × 2 × 8 = 262,144 24

VOC2007: accuracy  #classes: 20  #training: 5000  #testing: 5000 25

ILSVRC2010: accuracy  #classes: 1000  #training: 1,200,000  #testing: 150,000 26

SUN397: accuracy  #classes: 397  #training: 19,850  #testing: 19,850 27

Fine-Grained Categorization Selecting features is more important 28

Selection of subtle differences? 29

What features (parts) are chosen? 30

How about accuracy? 33

Published results Compact Representation for Image Classification: To Choose or to Compress? Yu Zhang, JianxinWu, Jianfei Cai CVPR 2014 Towards Good Practices for Action Video Encoding JianxinWu, Yu Zhang, Weiyao Lin CVPR 2014 34

New methods & results in arXiv  VOC 2012: 90.7%, VOC 2007: 92.0%  http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?c hallengeid=11&compid=2  http://arxiv.org/abs/1504.05843  SUN 397: 61.83%  http://arxiv.org/abs/1504.05277  http://arxiv.org/abs/1504.04792  Details of fine-grained categorization  http://arxiv.org/abs/1504.04943 35

DSP  An intuitive, principled, efficient, and effective image representation for image recognition  Using only the convolutional layers of CNN  Very efficient, but impressive representational power  No fine-tuning at all  Extremely small but effective FV / VLAD encoding (K=1, or 2)  Small memory footprint  New normalization strategy  Matrix norm to utilize global information  Spatial pyramid  Natural and principled way to integrate spatial information 36

D3  Discriminative Distribution Distance  FV , VLAD and Super Vectors are generative representations  They ask “how one set is generated?”  But for image recognition, we care about “how two sets are separated ?”  Proposed directional distribution distance to compare two sets  Proposed using a classifier MPM to robustly estimate the distance  D3 is very stable  D3 is very efficient 37

Multiview image representation  Using DSP as the global view  But context is also important: what are the neighborhood structure?  Solving distance metric learning as a DNN  Called the label view  Integrated (global+label) views  90.7% @ VOC2012 recognition task  92.0% @ VOC2007 recognition task 38

Thanks! 39

VALSE webinar 2015 5 27 Feature Selection in Image and Video - PowerPoint PPT Presentation

VALSE webinar 2015 5 27 Feature Selection in Image and Video Recognition JianxinWu National Key Laboratory for Novel Software Technology Nanjing University http://lamda.nju.edu.cn Introduction For image classification, how to

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Image Analysis System Example: Image Classification System pre feature feature segmentation

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

VALSE VA ON ONLINE Tr Tracking Mu Multiple Ob Objects in in Im Image Se Sequences

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Low-rank modeling for data representation Chong Peng College of Science and Technology, Qingdao

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

1 Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist,

STAT 215 Polynomials, Multicollinearity Colin Reimer Dawson Oberlin College 4 November 2016

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Trade Unions and the Scale and Scope of Multi-Product Firms Michael Koch Hartmut Egger

Lecture 9: Residual Analysis Instructor: Prof. Shuai Huang Industrial and Systems Engineering

of Australian hospital data Liam HEINIGER a , Norm GOOD b and Sankalp KHANNA b a University of

Local or Global Smoothing? A Bandwidth Selector for Dependent Data Francesco Giordano Maria

Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

VALSE webinar 2015 5 27 Feature Selection in Image and Video - PowerPoint PPT Presentation

VALSE webinar 2015 5 27 Feature Selection in Image and Video Recognition JianxinWu National Key Laboratory for Novel Software Technology Nanjing University http://lamda.nju.edu.cn Introduction For image classification, how to

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Selection: ROC and Subset Selection Theodoridis 5.5-5.7 Using ROC for Feature Selection

Image Analysis System Example: Image Classification System pre feature feature segmentation

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Week 3 Video 4 Automated Feature Generation Automated Feature Selection Automated Feature

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Mutual Information an Adequate Tool for Feature Selection ? Benot Frnay November 15, 2013

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

VALSE VA ON ONLINE Tr Tracking Mu Multiple Ob Objects in in Im Image Se Sequences

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Low-rank modeling for data representation Chong Peng College of Science and Technology, Qingdao

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

1 Data science and engineering for local weather forecasts Nikhil R Podduturi Data {Scientist,

STAT 215 Polynomials, Multicollinearity Colin Reimer Dawson Oberlin College 4 November 2016

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Trade Unions and the Scale and Scope of Multi-Product Firms Michael Koch Hartmut Egger

Lecture 9: Residual Analysis Instructor: Prof. Shuai Huang Industrial and Systems Engineering

of Australian hospital data Liam HEINIGER a , Norm GOOD b and Sankalp KHANNA b a University of

Local or Global Smoothing? A Bandwidth Selector for Dependent Data Francesco Giordano Maria

Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani