fabio celli
play

Fabio Celli fabio.celli@unitn.it Unsupervised Personality - PowerPoint PPT Presentation

Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications Sep 16, 2014 Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality?


  1. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications Sep 16, 2014

  2. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? Sep 16, 2014

  3. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? Sep 16, 2014

  4. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? Sep 16, 2014

  5. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? -how can we recognize it in an unsupervised way? Sep 16, 2014

  6. Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? -how can we recognize it in an unsupervised way? -which applications? Sep 16, 2014

  7. Fabio Celli fabio.celli@unitn.it “ ” Personality describes persistent human behavioral responses to broad classes of environmental stimuli. [Adelstein et al 2011] Sep 16, 2014

  8. Fabio Celli fabio.celli@unitn.it The Big 5 factor theory Sep 16, 2014

  9. Fabio Celli fabio.celli@unitn.it The Big 5 factor theory -self assessments -observed assessments (+agreement)  Sep 16, 2014

  10. Fabio Celli fabio.celli@unitn.it The Big 5 factor theory -self assessments -observed assessments (+agreement)  -100 item test -50 item test Ground truth -44 item test -10 item test Sep 16, 2014

  11. Fabio Celli fabio.celli@unitn.it X Y X X X X personality recognition Sep 16, 2014

  12. Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] Sep 16, 2014

  13. Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] predict classes x f ~.57% better e a predict c scores o rae ~.97% bad [Mairesse et Al. 2007] Sep 16, 2014

  14. Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] 5 classifiers (one per trait) predict binary classes or scores Sep 16, 2014

  15. Fabio Celli fabio.celli@unitn.it personality recognition from text X Y X X X X Sep 16, 2014

  16. Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text Top-down approach Bottom-Up approach Exploit lexical resources [Mairesse et al [Oberlander & Search for patterns associated to as features, finding correlations 2007] Nowson 2006] Personality trait poles with personality trait poles [Scwartz et al [Iacobelli et al pattern 2013] 2011] labeled extraling. labeled text feature correlation text resource Sep 16, 2014

  17. Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text Top-down approach Bottom-Up approach Exploit lexical resources [Mairesse et al [Oberlander & Search for patterns associated to as features, finding correlations 2007] Nowson 2006] Personality trait poles with personality trait poles [Scwartz et al [Iacobelli et al pattern 2013] 2011] labeled extraling. labeled text feature correlation text resource Mixed approach Use many resources (sentiment, [Markovikj et al Psycholinguistic, semantic) + word patterns 2013] + feature selection Sep 16, 2014

  18. Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text 5 classifiers (one per trait) predict binary classes or scores Large feature space, reuced with feature selction Sep 16, 2014

  19. Fabio Celli fabio.celli@unitn.it X Y X X X X Unsupervised personality recognition from text Sep 16, 2014

  20. Fabio Celli fabio.celli@unitn.it Unsupervised personality recognition from text We need: -unlabeled text + authors (many texts per author) -small labeled test set -correlations between language and personality In literature: 3 classes: high, (y) mid, (o) low (n) 2 classes: high (y) low (n) Sep 16, 2014

  21. Fabio Celli fabio.celli@unitn.it Unsupervised personality recognition from text We need: -unlabeled text + authors (many texts per author) -small labeled test set -correlations between language and personality Author1 text1-1 Author2 text2-1 Author1 text1-2 AuthorN textN-1 Author2 text2-2 AuthorN textN-2 ... Sep 16, 2014

  22. Fabio Celli fabio.celli@unitn.it sample data data distrib Sep 16, 2014

  23. Fabio Celli fabio.celli@unitn.it sample data data distrib post > onoon post > nnnoy post > noyny Sep 16, 2014

  24. Fabio Celli fabio.celli@unitn.it sample data data distrib post > onoon post > nnnoy post > noyny user1 post ynnyn onoon user1 post nnyon conf user2 post nnyyy noyyo user2 post nyyyo Sep 16, 2014

  25. Fabio Celli fabio.celli@unitn.it onoon Test set evaluate noyyo Sep 16, 2014

  26. Fabio Celli fabio.celli@unitn.it Problems of supervised: supervised approaches to Computational 1) overfitting → social Personality Recognition network data samples are too small to extract good model models and bottom up features labeled approaches extract very few (bottom-up data good patterns or top-down) new 2) multilinguality → top data unseen down approaches use data language dependent resources Sep 16, 2014

  27. Fabio Celli fabio.celli@unitn.it Problems of supervised: supervised approaches to Computational 1) overfitting → social Personality Recognition network data samples are too small to extract good model models and bottom up features labeled approaches extract very few (bottom-up data good patterns or top-down) new 2) multilinguality → top data unseen down approaches use data language dependent resources Domain adaptation is a learning problem where a model is generalized across domains, and it is successfull when it minimizes the difference Avantage of of performance from a source to a target domain unsupervised [BenDavid et Al. 2006] Personality recognition: model features - domain sample (bottom-up adaptability target source or top-down) = domain domain data Sep 16, 2014

  28. Fabio Celli fabio.celli@unitn.it We added a part of the algorithm (semi-supervised). We explot the high confidence predictions from the unsupervised system to label an unlabeled large training set and extract n-grams from there that we add to the initial correlation set n-grams Large Unlabeled set High conf labels Sep 16, 2014

  29. Fabio Celli fabio.celli@unitn.it Two different datasets fb essays essays [Pennebaker & King 1999] [Mairesse et Al. 2007] is a big collection of stream of consciousness writings of studentswho took the big5. Lang: English n-grams Unlabeled= ~2000 users Test= ~200 users Large PersFB Unlabeled [Celli & Polonio (2013)] set is a small collection of Facebook statuses High of students who took the big5. conf Lang: Italian. labels Unlabeled= ~200 users Test= ~30 users Sep 16, 2014

  30. Fabio Celli fabio.celli@unitn.it many different correlation sets: Two different datasets -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) fb -lang.indep (mairesse et al 2007) essays -LIWC (golbek et al 2011) essays -n-grams (iacobelli et al 2011) [Pennebaker & King 1999] -n-grams (from unlbeleld text) [Mairesse et Al. 2007] is a big collection of stream of consciousness writings of studentswho took the big5. Lang: English n-grams Unlabeled= ~2000 users Test= ~200 users Large PersFB Unlabeled [Celli & Polonio (2013)] set is a small collection of Facebook statuses High of students who took the big5. conf Lang: Italian. labels Unlabeled= ~200 users Test= ~30 users Sep 16, 2014

  31. Fabio Celli fabio.celli@unitn.it many different correlation sets: -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) -lang.indep (mairesse et al 2007) -LIWC (golbek et al 2011) -n-grams (iacobelli et al 2011) -n-grams (from unlbeleld text) Sep 16, 2014

  32. Fabio Celli fabio.celli@unitn.it many different correlation sets: 12dimensions: -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) Nchar, Nphon, Nsyl, -lang.indep (mairesse et al 2007) Kffrq, Kfcat, Brownfrq -LIWC (golbek et al 2011) Tlfrq, -n-grams (iacobelli et al 2011) Conc, Fam -n-grams (from unlbeleld text) Imag, aoa Sep 16, 2014

Recommend


More recommend