UT DALLAS Erik Jonsson School of Engineering & ComputerScience Convolutional Prototype Ensemble Robust Stream Classification & Novel Class Detection Zhuoyi Wang * , Hemeng T ao * , Swarup Changra * , Latifur Khan * * The University of T exas at Dallas, Richardson TX,USA This material is based upon work supported by FEARLESS engineering
Agenda ❑ High Dimensional Data Stream Mining and Challenges ❑ Shortcomings of Current Solutions ❑ The Proposed Approach – Novel Class Detection – Classification – Incremental Learning – Performance Analysis & Improvement ❑ Experiments ❑ Discussion 2 FEARLESS engineering
High Dimensional Stream Mining ➢ High Dimensional Data Stream: – continuous flow of high dimensional instances. – common in life ’s image recognition and text application. Scene Stream in Flow of news autonomous system Summary in Social Network. ➢ Challenge: ➢ May evolve new emerging class during stream scenario. ➢ Limited amount of labeled data. ➢ Time limited for the execution of learning methods. 3 FEARLESS engineering
Evolving new class (Novel Class) Novel class High dimensional space in real world image data set (FASHION-MNIST) Novel Class Previous work: Traditional low dimensional space of ODIN[4], Open-Set[5] IRIS dataset Previous work: ECSMiner[1], SAND[2], ECHO[3]. [1]. Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C., Han, J., & Thuraisingham, B. (2012, December). Stream classification with recurring and novel class detection using class- based ensemble. In 2012 ICDM. [2]. Haque, Ahsanul, Latifur Khan, and Michael Baron. "Sand: Semi-supervised adaptive novel class detection and classification o ver data stream." In AAAI 2016. [3]. Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." In 2016 ICDE. [4]. Liang, Shiyu, Yixuan Li, and R. Srikant. "Enhancing the reliability of out-of-distribution image detection in neural networ ks." In ICLR 2017. [5]. Bendale, Abhijit, and Terrance E. Boult. "Towards open set deep networks." In CVPR 2016. 4 FEARLESS engineering
Limitation of Time and Space ➢ Generating instances from novel/unseen class sets ➢ Incrementally training classifier ensemble from new emerge class sets. Novel Class chunk D 2 D 1 D 3 D n Class D 3 D 4 D 4 D 5 Set New coming stream instances Network Note: D i contains M 1 M 2 M 5 M 3 Models instances from novel class set i Ensemble M a Mb Mc Prediction Previous work: [1]. Han, Shizhong , et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016. [2]. Rebuffi, Sylvestre-Alvise, et al. "icarl: Incremental classifier and representation learning." In CVPR. 2017. 5 FEARLESS engineering
Shortcomings of Current Solutions ❖ Shortcomings : – Novel Class Detection: For traditional appraoch like SAND[1], ECHO[2], they typically suiable for the low dimensional feature space, where the novel class instances farther away from clusters containing known class examples. For recent years Deep Neurual Network (DNN) based methods such as [3] and [4], they utilize the DNN with softmax output and filter threshold. However, softmax function tend to allocate the new coming samples to a known class with high confidence[5], only apply the softmax output for rejecting novelty class is not suitable enough. [1]. Haque, Ahsanul, Latifur Khan, and Michael Baron. "Sand: Semi-supervised adaptive novel class detection and classification over data stream." In AAAI 20 16. [2]. Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." In 2016 ICDE. [3]. Han, Shizhong , et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016. [4]. Liang, Shiyu, Yixuan Li, and R. Srikant. "Enhancing the reliability of out-of- distribution image detection in neural networks." In ICLR 2017. [5]. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014. 6 FEARLESS engineering
Shortcomings of Current Solutions ❖ Shortcomings : – Incremental Learning Current methods should also apply incremental learning to adapt changes along the high dimensional stream over a long period of time. Typical solution for the DNN mainly apply network ensemble [1] or fine tune, like: Incremetal ensemble of different DNN or layer Shortcomes: Incresing of either DNN structure parameters or layer embedding during continous or lifelong learning scenario would be both time and space consuming. [1]. Han, Shizhong , et al. "Incremental boosting convolutional neural network for facial action unit recognition." In NIPS 2016. 7 FEARLESS engineering
Motivation We could model the data of each (existed) class as a Gaussian mixture component, so the novel class could be regarded as a different distribution compared with existed ones, although they may bear some resemblance to the existed classes. Existed Class Potential Novel Class Distribution Distribution Novel Class Detection: could be addressed through filter out the anomaly large distance between different distributions; Incremental Learning: could be solved by adding new distribution and updating existed ones. 8 FEARLESS engineering
Proposed Approach: Prototype Ensemble Learning ➢ Prototype means class-characteristic distribution, if each class is regarded as a Gaussian mixture distribution, the prototypes would act as the mean value of each class' gaussian components Prototype2 Prototype1 Prototype3 Class: Car Prototype4 ✓ Novel Class Detection: – Similar instances of a class forms different prototypes under a certain class, outlier exmaples potentially form a new prototype associated with a novel class which is easier to be detected. ✓ Stream Classification: – Ensemble prototype as classifier to trained on different section of stream instances, and used for classification. ✓ Incremental Learning: – Create new prototypes accoring to novel class instances continuesly during stream process, then update the existing prototypes to make it adapt changes along the stream. 9 FEARLESS engineering
Overview: CPE 10 FEARLESS engineering
Prototype Establish We employ a Deep Neural Network architecture with convolutional layers. For a given input X, the output of the network is denoted by the , where f is the feature representation and θ is the correspond network parameters . For every class i, we select a small set of instances D i from D, and form the exemplar set Ɛ i . Then we form the initial prototype: Here, each prototype is denoted by 𝑞 𝑗𝑘 , i indicates a class label index in 𝑍 , and j is the prototype index. We denote the set of prototypes for each class 𝑧 𝑗 ∈ 𝑍 by 𝑄 𝑗 . 𝑞 𝑗𝑘 ∈ 𝑄 𝑗 11 FEARLESS engineering
Prototype Ensemble Loss We focus on improving local separation between prototypes. 12 FEARLESS engineering
Overall loss function for Training Similar with softmax/cross-entropy, the probability that x belongs to the prototype 𝑞 𝑗𝑘 is defined as: where C is the size of class set Y , K is the maximum number of the prototypes per class. Therefore, the probability of class label assignment for x is given: 13 FEARLESS engineering
Overall loss function for Training ➢ Overall objective function maximize the probability of x being Act as a regularization of the loss function associated with a prototype in P, could be regarded as the Cross Entropy loss for prototype. 14 FEARLESS engineering
Novel Class Detection Coming instance X Go through DNN and get Calculate and compare distance with other prototypes Threshold of prototype to Ensemble prot 𝐐 𝐣 for class i determined accept or reject: (Step 1) P i1 P ik P i2 . . . outlier outlier outlier If the distance of x to it’s nearest prototype X comes from AND (Step 2) is larger than correspond threshold, we class i False True degerming it as a novel class instance X is a potential novel class instance 15 FEARLESS engineering
Incremental Learning 16 FEARLESS engineering
Prototype Based Incremental Learning Period 1 Period 2 Period 3 Establish New Prototype Then: Apply back-prop to update parameter θ in Network model. 17 FEARLESS engineering
Prototype Based Incremental Learning Period 1 Period 2 Period 3 Update Existing Prototype Then: Apply back-prop to update parameter θ in Network model. 18 FEARLESS engineering
Complexity Time Complexity: Size of novel class candidate buffer is 𝑡 𝐶 , time complexity of calculating the gradient of one example is a constant 𝐷 , mini- batch size is 𝑡 𝑛𝑗𝑜𝑗 , epochs number is 𝑜 𝑓 , the number of classes in Stream is 𝑍 ′ . The time complexity of CPE is O ቀ 2 𝐷𝑜 𝑓 ൫ 𝑇 𝑛𝑗𝑜𝑗 + 𝑡 𝐶 𝑍 ′ ൯ ቁ . Space Complexity: It is a constant since the space used by exemplars, prototypes, buffer and network are constant. 19 FEARLESS engineering
Experiment Name of Data Number of Number of Set Instances Features FASHION-MNIST 70,000 784 SVHN 100,000 3072 CIFAR-10 60,000 3072 LSUN 80,000 4096 CINIC 106,110 4096 New-York-News 66,000 300 CPE setup: 1. DenseNet as the DNN straucture. 2. M = 2000 exemplars, K = 10 (Maximum prototype amount) 20 FEARLESS engineering
Recommend
More recommend