Saha, K., et al. 2019. Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII 2019) ., http://koustuv.com/papers/ACII19_SM_Imputation.pdf Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior Saha, K. , Reddy, M. D., Das Swain, V., Gregg, J. M., Grover, T., Lin, S., Martinez, G. J., Mattingly, S. M., Mirjafari, S., Mulukutla, R., Nies, K., Robles-Granda, P., Sirigiri, A., Yoo, D. W., Audia, P., Campbell, A. T., Chawla, N. V., D’Mello, S. K., Dey, A. K., Jiang, K., Liu, Q., Mark, G., Moskal, E., Striegel, A., & De Choudhury, M. Koustuv Saha, Georgia Tech
Sensing Human Survey Instruments Behavior • Self-Report Questionnaires Active Sensing •Ecological Momentary Assessments (EMAs) Passive Sensing • Smartphones and Wearables • Social Media 2
Social Media as a Passive Sensor u Naturalistic setting There are limitations u Unobtrusive access associated with the social u Longitudinal and Extended Periods (beyond media data stream study period) u Verbal and Behavioral 3
Limitations (Social Media Data Stream) Retrospective in nature: So, the availability and quality of data depends on the social media use of the participant 4
Limitations (Social Media Data Stream) Not everybody is on social media Social Media population skewed towards young adults (Pew, 2018) 5
Limitations (Social Media Data Stream) Data collection challenges Changing nature of social media APIs (Facebook, Twitter, Instagram, Linkedin, etc.) 6
Consequences in Studies of Human Behavior Multimodal Sensing Studies have to focus on: - a very (social media) active participant cohort: hurts generalizability and recruitment - disregard those with no social media data: hurts scalability - disregard the capability of social media data stream altogether: hurts multisensor-fusion capabilities 7
Our work concerns… …how can we leverage the potential of social media data in multimodal sensing studies of human behavior, while navigating the challenges and limitations of acquiring social media data? 8
Our work contributes… … a statistical framework to impute missing social media features by learning individuals’ observed behaviors from other passive sensing streams (Bluetooth beacons, wearables, and smartphone sensors). 9
Social Ecological Framework Human behaviors and attributes can be considered to be deeply embedded in the complex interplay between an individual, their relationships, the communities they belong to, and societal factors + . + Ralph Catalano. 1979. Health, behavior and the community: An ecological perspective. Pergamon Press New York. 10
The Tesserae Project Wearable 757 Smartphone Participants BT Beacon Social Media By leveraging passive sensors, this study aims at proactively identifying changes in an individual that may impact their wellbeing Surveys and job performance 11
Data and Problem (Predicting Psychological Attributes) v 603 participants with physical sensor (Bluetooth, Smartphone, and Wearable) data v 496 participants with social media (Facebook) data (~82% of the dataset) Therefore, v to include all the participants, we could only incorporate the physical sensor features, Or, v To include all the sensor modalities, we can only include a subset of participants. Imputing missing social media features help us use all sensors and all participants’ data 12
Feature Engineering v Features known in theory to be predictive of psychological constructs (personality traits, affect) v Physical Sensor Features : heart rate, heart rate variability, sleep, stress, step count, physical activity, mobility, phone use, call use, work duration (130 raw features) v Social Media Features : psycholinguistic attributes (LIWC), top n-grams, sentiment, social capital (number of check-ins, engagement, activity with friends, etc.) (5,077 raw features) 13
Feature Engineering (Selection & Transformation) Sensor Derived (130) (124) (51) (30) Sensor Features Transformed HRV, Stre ss Fitness, Sleep, Phy. Activity, Features Location, Phone Use, Feature Selection on Feature Selection on Feature Transformation with Desk Activity Coefficient of Variance ( cv ) Pairwise Correlation ( r ) Principal Component Analysis (PCA) Transform features to n components Drop features with Drop features from those cv > (mean + 2*stdev.) pairs that show |r| > 0.8 Based on explained variance Social Media Derived Social Media Features Transformed (5,077) (4,806) (3,716) (200) LIW C + N-gram s + Features Sentiment + Social Capital 14
Imputing Social Media Features Can sensor features predict social media features? Sensors Features FaFebook Features −0.2 −0.1 0.0 0.1 0.2 Pearson’s correlation ( r) ranges between -0.21 and 0.22 showing the likelihood of weak correlation 15
Imputing Social Media Features: Methods For each of the 200 social media transformed features, we build a separate model that: - uses the sensor transformed features as the independent variables and - predicts the corresponding social media transformed feature as the dependent variable. 16
Imputing Social Media Features: Results k -fold cross-validation and pooled 20.0 accuracy (Pearson’s correlation ( r ) 17.5 0ean: 0.78 between actual and predicted # Components 15.0 features 12.5 10.0 7.5 GBR (Gradient Boosted Random 5.0 Forest Regression) performs the best: 2.5 mean r = 0.78 0.0 0.2 0.4 0.6 0.8 1.0 Correlation 17
Is Imputation Effective? PREDICTING PSYCHOLOGICAL CONSTRUCTS WITH MULTIMODAL SENSING DATA 18
Predicting Psychological Constructs with Multisensor Data Psychological Actual Final Base Models (Who have social media data) Constructs Feature Set Feature Set S . X : Y 1 1 1 SS . X + X’ : Y Participants 1 1 1 1 Type 1 Y X X’ X X’ 1 1 1 1 1 Models (Who do not have social media data) S . X : Y Participants 2 2 2 SS . X + X’ : Y Y X X X’ Type 2 2 2 2 2 2 2 2 2 Final Models (All participants) Imputation Model S . ( X + X ) : ( Y + Y ) 3 1 2 1 2 Sensor Transformed Features Imp . X : X ’ 1 1 SS . ( X + X ) + ( X + X’ ) : ( Y + Y ) 1 2 1 1 1 2 3 Social Media Transformed Features X ’ : Imp (X ) 2 2 19
Predicting Psychological Constructs with Multisensor Data We evaluate all our prediction models using three kinds of algorithms: v Linear Regression v Gradient Boosted Regression v Neural Network Regression The above algorithms cover a broad spectrum of algorithm families 20
Effectiveness of Social Media Feature Imputation extraversion 6ensors+IPSuted F% extraversion 6ensors+IPSuted F% extraversion 6ensors+IPSuted F% 6ensors 6ensors 6ensors agreeaEleness agreeaEleness agreeaEleness FonsFientiousness FonsFientiousness FonsFientiousness neurotiFisP neurotiFisP neurotiFisP oSenness oSenness oSenness Sos.aIIeFt Sos.aIIeFt Sos.aIIeFt neg.aIIeFt neg.aIIeFt neg.aIIeFt stai.trait stai.trait stai.trait 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 60A3E % 60A3E % 60A3E % Linear Regression Gradient Boosted Regression Multilayer Perceptron SMAPE comparing three models that use physical sensor features vs. those that use sensor and imputed features to predict psychological constructs on the entire dataset Outcome: Imputed Social Media Features Improve Predictions 21
Robustness Against Other Imputation Approaches extraversion extraversion 6ensors+IPSuted F% 6ensors+IPSuted F% 6ensors 6ensors agreeaEleness agreeaEleness FonsFientiousness FonsFientiousness neurotiFisP neurotiFisP oSenness oSenness Sos.aIIeFt Sos.aIIeFt neg.aIIeFt neg.aIIeFt stai.trait stai.trait 0 10 20 30 40 50 60 0 10 20 30 40 50 60 60A3E % 60A3E % Mean Imputation Random Imputation SMAPE comparing prediction models that use sensor features vs. those that use sensor and mean- / random- imputed features Outcome: Mean / Random Imputation does not improve (or even depletes) predictions 22
Discussion • Contribution: A framework to impute social media features in longitudinal and large-scale multimodal sensing studies of human behavior • Theoretically situated in the Social Ecological Model • Similar approach can be applied for other sensors • Ethics: Should imputation be done on those individuals who do not want to share their social media data? 23
Ethics • Latent dimensions do not necessarily translate to social media activity or behavior • Caution against the use as a means to surveil • Should imputation be done on those individuals who do not want to share their social media data? 24
Saha, K., et al. 2019. Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior. In Proceedings of International Conference on Affective Computing and Intelligent Interaction (ACII 2019) ., http://koustuv.com/papers/ACII19_SM_Imputation.pdf Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No. 2017-17042800007. Thank You @kous2v| koustuv.saha@gatech.edu | koustuv.com
Recommend
More recommend