Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? -how can we recognize it in an unsupervised way? Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised Personality Recognition from Text: Possible Applications -what is personality? -what is personality recognition? -how can we recognize personality from text? -how can we recognize it in an unsupervised way? -which applications? Sep 16, 2014
Fabio Celli fabio.celli@unitn.it “ ” Personality describes persistent human behavioral responses to broad classes of environmental stimuli. [Adelstein et al 2011] Sep 16, 2014
Fabio Celli fabio.celli@unitn.it The Big 5 factor theory Sep 16, 2014
Fabio Celli fabio.celli@unitn.it The Big 5 factor theory -self assessments -observed assessments (+agreement) Sep 16, 2014
Fabio Celli fabio.celli@unitn.it The Big 5 factor theory -self assessments -observed assessments (+agreement) -100 item test -50 item test Ground truth -44 item test -10 item test Sep 16, 2014
Fabio Celli fabio.celli@unitn.it X Y X X X X personality recognition Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] predict classes x f ~.57% better e a predict c scores o rae ~.97% bad [Mairesse et Al. 2007] Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Personality Recognition is the automatic classification of the personality of authors from behvioral features (text, facial expressions, profile pictures, works, and so on). gold standard labels can be obtained by means of the big5 personality tests. [Norman 1963; Costa & MacRae 1985; Digman 1990] 5 classifiers (one per trait) predict binary classes or scores Sep 16, 2014
Fabio Celli fabio.celli@unitn.it personality recognition from text X Y X X X X Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text Top-down approach Bottom-Up approach Exploit lexical resources [Mairesse et al [Oberlander & Search for patterns associated to as features, finding correlations 2007] Nowson 2006] Personality trait poles with personality trait poles [Scwartz et al [Iacobelli et al pattern 2013] 2011] labeled extraling. labeled text feature correlation text resource Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text Top-down approach Bottom-Up approach Exploit lexical resources [Mairesse et al [Oberlander & Search for patterns associated to as features, finding correlations 2007] Nowson 2006] Personality trait poles with personality trait poles [Scwartz et al [Iacobelli et al pattern 2013] 2011] labeled extraling. labeled text feature correlation text resource Mixed approach Use many resources (sentiment, [Markovikj et al Psycholinguistic, semantic) + word patterns 2013] + feature selection Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Approaches to Personality Recognition from text 5 classifiers (one per trait) predict binary classes or scores Large feature space, reuced with feature selction Sep 16, 2014
Fabio Celli fabio.celli@unitn.it X Y X X X X Unsupervised personality recognition from text Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised personality recognition from text We need: -unlabeled text + authors (many texts per author) -small labeled test set -correlations between language and personality In literature: 3 classes: high, (y) mid, (o) low (n) 2 classes: high (y) low (n) Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Unsupervised personality recognition from text We need: -unlabeled text + authors (many texts per author) -small labeled test set -correlations between language and personality Author1 text1-1 Author2 text2-1 Author1 text1-2 AuthorN textN-1 Author2 text2-2 AuthorN textN-2 ... Sep 16, 2014
Fabio Celli fabio.celli@unitn.it sample data data distrib Sep 16, 2014
Fabio Celli fabio.celli@unitn.it sample data data distrib post > onoon post > nnnoy post > noyny Sep 16, 2014
Fabio Celli fabio.celli@unitn.it sample data data distrib post > onoon post > nnnoy post > noyny user1 post ynnyn onoon user1 post nnyon conf user2 post nnyyy noyyo user2 post nyyyo Sep 16, 2014
Fabio Celli fabio.celli@unitn.it onoon Test set evaluate noyyo Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Problems of supervised: supervised approaches to Computational 1) overfitting → social Personality Recognition network data samples are too small to extract good model models and bottom up features labeled approaches extract very few (bottom-up data good patterns or top-down) new 2) multilinguality → top data unseen down approaches use data language dependent resources Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Problems of supervised: supervised approaches to Computational 1) overfitting → social Personality Recognition network data samples are too small to extract good model models and bottom up features labeled approaches extract very few (bottom-up data good patterns or top-down) new 2) multilinguality → top data unseen down approaches use data language dependent resources Domain adaptation is a learning problem where a model is generalized across domains, and it is successfull when it minimizes the difference Avantage of of performance from a source to a target domain unsupervised [BenDavid et Al. 2006] Personality recognition: model features - domain sample (bottom-up adaptability target source or top-down) = domain domain data Sep 16, 2014
Fabio Celli fabio.celli@unitn.it We added a part of the algorithm (semi-supervised). We explot the high confidence predictions from the unsupervised system to label an unlabeled large training set and extract n-grams from there that we add to the initial correlation set n-grams Large Unlabeled set High conf labels Sep 16, 2014
Fabio Celli fabio.celli@unitn.it Two different datasets fb essays essays [Pennebaker & King 1999] [Mairesse et Al. 2007] is a big collection of stream of consciousness writings of studentswho took the big5. Lang: English n-grams Unlabeled= ~2000 users Test= ~200 users Large PersFB Unlabeled [Celli & Polonio (2013)] set is a small collection of Facebook statuses High of students who took the big5. conf Lang: Italian. labels Unlabeled= ~200 users Test= ~30 users Sep 16, 2014
Fabio Celli fabio.celli@unitn.it many different correlation sets: Two different datasets -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) fb -lang.indep (mairesse et al 2007) essays -LIWC (golbek et al 2011) essays -n-grams (iacobelli et al 2011) [Pennebaker & King 1999] -n-grams (from unlbeleld text) [Mairesse et Al. 2007] is a big collection of stream of consciousness writings of studentswho took the big5. Lang: English n-grams Unlabeled= ~2000 users Test= ~200 users Large PersFB Unlabeled [Celli & Polonio (2013)] set is a small collection of Facebook statuses High of students who took the big5. conf Lang: Italian. labels Unlabeled= ~200 users Test= ~30 users Sep 16, 2014
Fabio Celli fabio.celli@unitn.it many different correlation sets: -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) -lang.indep (mairesse et al 2007) -LIWC (golbek et al 2011) -n-grams (iacobelli et al 2011) -n-grams (from unlbeleld text) Sep 16, 2014
Fabio Celli fabio.celli@unitn.it many different correlation sets: 12dimensions: -MRC (mairesse et al 2007) -LIWC (mairesse et al 2007) Nchar, Nphon, Nsyl, -lang.indep (mairesse et al 2007) Kffrq, Kfcat, Brownfrq -LIWC (golbek et al 2011) Tlfrq, -n-grams (iacobelli et al 2011) Conc, Fam -n-grams (from unlbeleld text) Imag, aoa Sep 16, 2014
Recommend
More recommend