Stress Classification: A Deep Stacked Autoencoder Approach Yusuf Gandhi Putra Faculty of Information Technology Universitas Advent Indonesia 2017 1/32
Table of Contents Introduction 1 Literature Review 2 Dataset 3 Experiment 4 Conclusions 5 2/32
Introduction 3/32
Introduction Stress is an inseparable part of modern people’s lives. From personal problems to pressure in work places, people may find stress stimuli from challenging circumstances within their daily activities. Stress is considered harmful to personal well-being and may negatively impact both physical and mental health. It is known as a factor that can lead to unfavourable state, even terminal illnessess (Sharma et al., 2013). Understanding stress better has become crucial as it may prevent its negative effects on people’s personal well-beings. Several studies have reported using facial data to detect stress. Capturing these types of data uses non-intrusive methods, e.g. recording facial patterns using video or thermal cameras. Despite 4/32
Introduction the unconventional way to collect data for stress identification, the studies show promising results. A study shows that they succesfully identify stress by using thermal spectrums (TS) due to the skin temperature changes (Yuen et al., 2009). Furthermore, a number of studies use visual spectrums (VS) to model emotion (Dhall et al., 2011) and depression (Joshi et al., 2012). In a more recent study in this non-obtrusive approach, Sharma et. al. attempted to classify stress based on facial patterns using the combination of VS and TS (Sharma et al., 2013) from the ANU StressDB. Based on Sharma’s work, we attempt to classify stress based on thermal images using deep neural network approach. 5/32
Literature Review 6/32
Neural networks Artificial Neural Networks (ANNs) are a computational model which is inspired by how human brains process information (Rojas, 2013). This model utilizes neurons, the basic building blocks of ANNs. These neurons are highly interconnected to simulate human brains mechanisms in solving problems. McCulloch and Pitts’s work in 1943 was considered the first attempt of building neural networks and laid the foundation of the field (Abraham, 2002). Their first model of neurons, known as McCulloch Pitts (MCP) neurons, was built on several assumptions on how actual neurons work. One of the assumptions was that activities of neurons are “all-or-none” processes (McCulloch and Pitts, 1943), which are represented within the modern neural networks model by the usage of activation functions. 7/32
Neural networks A neuron, the basic building block of neural networks, is a computational unit that takes x 1 , x 2 , . . . , x n as inputs. This computational unit sums all the inputs multiplied by their respective weights, i.e. w 1 , w 2 , . . . , w n . The neuron also has an output as a result of the input calculation h W , b ( x ) = f ( W T x ) = f ( � n i =1 W i x i + b ). Figure 1: Neuron architecture 8/32
Autoencoders Autoencoders, which are sometimes referred as autoassociative neural networks, autoassociators, or Diabolo networks (Scholz, 2012; Hinton and Salakhutdinov, 2006; Bengio, 2009), are neural networks that are trained to replicate the inputs in the output layer. The autoencoders attemps to produce h W , b ( x ) ≈ x . i.e., the autoencoders try to approximate the identity function. Although the attempt to approximate the identity function seems trivial, by putting constraints in the number of hidden neurons forces the autoassociative neural networks to compress the input and, as a result of the compression, produce the lower representation of the inputs (Ng, 2011). 9/32
Autoencoders Figure 2: Architecture of autoencoders 10/32
Stress Classification Stress has been an inseparable part of modern people’s lives nowadays. It has become important to better understand it as stress may cause undesirable conditions, e.g. physical illnesses. Thus, stress identification has been explored in numerous studies, including within the Computer Science field. Various machine learning techniques have been employed to achieve this, ranging from decision trees to Support Vector Machines (SVM). The measures used as the stress input vary as well, e.g. brain activity and skin response. These measures, however, often employ obtrusive methods i.e. the requirement to wear sensors on certain body parts. (Sharma et al., 2013) introduced a non-intrusive stress recognition model using thermal (TS) and visual (VS) spectrums. The TS can be captured as the blood flow under the surface of human face 11/32
Stress Classification skin and has been reported successfully used as a stress measure (Yuen et al., 2009). Best stress recognition rate from Sharma et al. was 72% using the combination of dynamic thermal patterns in histograms (HDTP) with TS and local binary patterns in three orthogonal planes (LBP-TOP) with VS. 12/32
Dataset 13/32
Dataset The description of the data set is as follows: There are in total of 31 participants. The data set consists of normal RGB & thermal data, referred as visual (VS) and thermal (TS) spectrums respectively. Each participant watched a sequence of stressful and non-stressful film clips while was being recorded. Figure 3: The experimental setup (Sharma et al., 2013) 14/32
Example data Figure 4: An example frame from the ANU StressDB dataset 15/32
Data Preprocessing A script from previous research is used to preprocess the thermal videos. The script extracts the video frames and store it as still images into 20 separated segments per video. This yields to, as there are 31 subjects in the data collection, 620 different segments. The number of frames in each segment vary and the size of each frame is 640x480 pixels as a result of the thermal video’s ratio of 4:3. The following algorithm is used to obtain the middle frame as one approach to frame summarization (Sze, Lam, and Qiu, 2005) 16/32
Data Preprocessing 17/32
Further Data Preprocessing As the aim of the research is to identify the state of stress based on the thermal data on participants’s facial region, we cropped the facial region into the size of 200 × 200 pixels. Figure 5: An example image in the previous and further-preprocessed datasets. 18/32
Further Data Preprocessing By reducing the images, the input size to the network decreased considerably to 120,000 neurons, only 13 . 02% of the original input size. 19/32
Experiment 20/32
Experiment - Introduction Autoencoders are able to extract lower features of images as their behaviours are similar to compression (Ng, 2011). Thus, it is possible to recognize features solely by using autoencoders and a final softmax layer for classification purposes. In this experiment, we decided to use stacked autoencoders on the ANU StressDB to classify stress. 21/32
Experiment - Architecture Overview As the images are in RGB channel mode, we had to separate the channels and trained each channel with a different autoencoder as we designed an autoencoder to classify one color channel. 22/32
Experiment - Architecture Overview At the end of each autoencoder, there is one softmax layer whose task is to do the final classification. We decided the final classification results by taking the majority vote of the three channels and C ( X ) = mode { a R ( X R ) , a G ( X G ) , a B ( X B ) } (1) where C ( X ) is the final classification function; a R , a G , a b are autoencoders for red, green, blue channels respectively; and X is the image input, while X R , X G , X B are red, green, blue sub-channels of the image. 23/32
Experiment - Architecture Overview 24/32
Experiment - Network Configuration We came up with two different architectures. In the first one, we decided to compress the pixels to 4,000 neurons on the first autoencoder layer and 400 on the second layer. The last layer is the softmax layer for the final classifying job. In the second topology, we used the first architecture as the basis and add two more autoencoder layers. The third layer consists of 40 neurons and the fourth autoencoder layer consists of 10 neurons. The architectures are as follows: First stacked autoencoder: 200-by-200 pixel images → compressed to 4000 neurons → compressed to 400 neurons → softmax classification layer (2 classes) 25/32
Experiment - Network Configuration Second deep stacked autoencoder: 200-by-200 pixel images → compressed to 4000 neurons → compressed to 400 neurons → compressed to 40 neurons → compressed to 10 neurons → softmax classification layer (2 classes) 26/32
Results of the Stacked Autoencoders 27/32
Conclusions 28/32
Conclusions & Future Works Conclusion: Autoencoders are able to classify stress with decent performance. The performance of the autoencoder with more layers is less accurate. It might be due to data overfitting. Future works: Choosing the appropriate hyper-parameters is challenging for the stress classification tasks. There are numbers of different configurations that are yet to be tested. Employing the TS video to increase the stress classification accuracy. 29/32
Recommend
More recommend