outline of the talk
play

Outline of the talk Part I: Representation(s) and Categorization(s) - PowerPoint PPT Presentation

Outline of the talk Part I: Representation(s) and Categorization(s) Structuring temporal sparse data with application to opinion mining Part II: Evolutionary Clustering for Sparse Data Julien Velcin Part III: Application to the ImagiWeb


  1. Outline of the talk Part I: Representation(s) and Categorization(s) Structuring temporal sparse data with application to opinion mining Part II: Evolutionary Clustering for Sparse Data Julien Velcin Part III: Application to the ImagiWeb Project University of Lyon – ERIC Lab Joint work with Y.M. Kim, A. Hasnat, S. Bonnevay, J. Jacques and more… Part IV: Conclusion and Future Work 1st Lyon-Columbia Research Workshop ISFA, June 27, 2016 2 Outline of the talk Studying representations Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data Part III: Application to the ImagiWeb Project Part IV: Conclusion and Future Work 3 4

  2. Nowadays with Internet Representing ≈ categorizing Chance and curse of big data: Philosophy, logic - Volume Necessary and Sufficient CondiYons [Aristotle] - Variety (sources and data) - Velocity Family resemblance [Wiagenstein,1958] etc. Psychology, linguisYcs CogniYve représentaYons and prototypes [Rosch,1973] LinguisYc categories [Lakoff,1987] Sociology Social representaYons [Lippmann,1922] [Moscovici,1961] For textual data: - semanYc gap - language is living è Data Science - curse of dimensionality 5 6 Key idea to take home Representation and sparseness Image of a movie Machine learning (weakly supervised clustering) Title Type Plot Actors Rythm Originality etc. can help for studying representaYons Tomorrowland Sci-fi (…) G. Clooney, H. Laurie… o + + o - - + “New Disney rather desappoinYng. But I like so much sci-fi movies I couldn’t miss it.” “ Ambi=ous and visually stunning , this movie…” What? How? Who? When “The film stars George Clooney, Hugh Laurie , Bria Robertson, and Raffey Cassidy” “Tomorrowland’ forgeEable look into future” topic learning opinion SN analysis, Yme-aware mining role detecYon models “ Like the whole plot but obviously too long for kids ” “How do you spell boring ? T-O-M-O-R-R-O-W-L-A-N-D. ” 8 7

  3. Outline of the talk Temporal evolution of entities t EnYty Part I: Representation(s) and Categorization(s) Part II: Evolutionary Clustering for Sparse Data category of similar representaYons = cluster of similar objects Part III: Application to the ImagiWeb Project Examples: - movie - poliYcian Part IV: Conclusion and Future Work - company - brand etc. 9 10 Sparse matrix as input Some state of the art Taking Yme into account descripYon features incremental clustering [Aggarwal,2003] [Labroche,2014] Author Time f 1 f 2 f 3 f 4 f 5 f 6 … f n-1 f n evoluYonary clustering pseudo1 t1 1 2 1 [ChakrabarY,2006] [Chi,2007] pseudo1 t2 1 1 pseudo1 t3 2 2 monitoring cluster evoluYon pseudo2 t1 3 1 1 [Spiliopoulou,2006,2013] pseudo3 t1 3 Dealing with sparse data pseudo3 t2 2 mixture models [Dempster,1977] pseudo3 t3 2 pseudo4 t3 3 1 topic models [Hofmann,1999] [Blei,2003] pseudo5 t3 3 2 default clustering [Velcin,2005] 11 12

  4. Our objective Model 1: Temporal Mixture Model TMM = probabilisYc generaYve model [Kim,2015] Analyze temporal sparse data using clustering idenYfy group of users who use similar descripYons track enYty’s image over Yme detect and interpret temporal changes Test on real data within ImagiWeb project What’s new? case study 1: retrospecYve approach: the recent past maaers image of French poliYcians given by Twiaer users no Dirichlet prior, in opposite to most topic models case study 2: Parameters to esYmate: image of a big naYonal company about nuclear energy given by bloggers OpYmizaYon by ExpectaYon-MaximizaYon (EM) 13 14 Model 2: Parametric link approach Differences between the two models MM-Plink = MM + linear link between (t-1) and (t) t + model selecYon using BIC RelaYon between the parameters μ t-1 and μ t : model 1: TMM cluster at Yme t cluster at Yme t link parameter (mult.) link parameter (add.) t Clustering esYmated with classic EM model 2: MM-Plink Different combinaYons tested for (δ,γ): + interpretaYon (1,0) = no change, (0,γ j,k ) = totally new clusters, (δ,γ) = same global change, etc. 15 16

  5. Outline of the talk ImagiWeb project Studying the image (representaYon) of enYYes emiaed from the social media and its evoluYon over Yme [Velcin,2014] EnYty Part I: Representation(s) and Categorization(s) Aspects: Twiaer - PoliYcal line - Future project - Balance sheet Blogs - Ethic Part II: Evolutionary Clustering for Sparse Data - InjuncYon - CommunicaYon etc. Part III: Application to the ImagiWeb Project Granted by the ANR for 3 years (2012-2015) Needs complementary skills: NLP, machine learning, sovware Part IV: Conclusion and Future Work engineering, analysis of public opinion, semiology… 6 partners : ERIC (management), CEPEL, LIA, AMI Sovware, EDF R&D, Xerox Research Centre Europe (XRCE) 17 18 Design of a full annotation scheme (Sarkozy, - ) (Sarkozy, communicaYon, + ) (Sarkozy, bilan, -- ) (Sarkozy, compétence, + ) 19 20

  6. Automatic annotation Extracting and monitoring images (Aaribut, + + ) La France est une république indivisible, démocraYque, laïque et (Ethique, + + ) (EnYté, + + ) (EnYté, - - ) (Projet, - - ) sociale, voilà mon engagement. #FH2012 E.g. (EnYté, + ) (EnYté, o ) (Projet, - ) (EnYté, - ) with poliYcians: Geste fort du président #Hollande qui parYcipera ce jeudi à la journée des (PosiYonnement, + ) mémoires, de la traite, de l'esclavage et de leurs aboliYons. Pourquoi j'aime bien Mélenchon et je voterai Hollande hap://t.co/ (InjoncYon, + ) TVM8RwoH via @*************** Author Time (a 1 ,++) (a 1 ,+) (a 1 ,o) (a 1 ,-) (a 1 ,- -) (a 2 ,++) … (a p ,-) (a p ,- -) #Delanoë "ce qui me frappe ds la campagne de #Hollande c son pseudo1 t1 1 2 1 honnêteté intellectuelle alors que #Sarkozy dit tout et n importe quoi" (Ethique:Honnêteté, + ) pseudo1 t2 1 1 @aut-1154 Neuilly sur Seine 61100 habitants , France 65000 000 .Votez pseudo1 t3 2 2 Hollande. (InjoncYon, + ) @****** Hollande n'a aucun charisme ! Il fait honte à la France et pseudo2 t1 3 1 1 (Personne:Charisme, - ) aux Français ! pseudo3 t1 3 SympaYsch, ce Hollande. Et culYvé avec ça. On a parlé saucisses (Personne:Charisme, - ) toute la soirée. pseudo3 t2 2 Je savais qu'Hollande était un gros mou de socialiste. Mais là si ce pseudo3 t3 2 (Ethique:Honnêteté, - ) n'est pas du reniement ou du renoncement ?#Libertédeconscience pseudo4 t3 1 François Hollande : le mensonge c'est maintenant: C'est cela un (Ethique:Honnêteté, - - ) président . Il y a pas comme un léger bug pseudo5 t3 3 2 Copé appelle Hollande à "reprendre en main" son gouvernement (Compétence, - ) "incompétent" hap://t.co/lPanwi5r via @LePoint 21 22 Quantitative results of TMM Testing model-based evolutionary clustering Tested on a subset of tweets (enYty FH, before and aver elecYon, k=9) François Hollande Comparison with: Dynamic Topic Model (DTM) [Blei,2006] Simple Mixture Model (MM) 20 50 90 Probab. Latent SemanYc Analysis (pLSA) [Hofmann,1999] distribu=on: EnYty Internal critera: Aaribute InjuncYon Co-ocurrence level ( COL ) PoliYcal line Average Unsmoothness ( AUS ) CommunicaYon Person 1 cluster Average Homogeneity ( AHM ) Skills of 254 users polarity ++ Author Consistency Sum ( ACS ) Ethic (before elecYon) + Balance sheet o - Project - - 24 23

  7. Quantitative results of TMM Quantitative results of MM-Plink Tested on a subset of tweets (enYty FH, 3 Yme COL AHM periods, k=3, total of ~3000 observaYons) 0.95 128 0.9 Comparison between: 0.85 123 0.8 0.75 Simple Mixture Model (MM) 118 0.7 0.65 Temporal Mixture Model (TMM) [Kim,2015] 113 0.6 0.55 Parametric-link MM (MM-Plink) = our new proposal 108 0.5 DTM MM pLSA TMM TTM DTM MM pLSA TMM TTM AddiYonal criterion: Average Perplexity ( APL ) AUS ACS 4.3 45 3.8 40 3.3 35 2.8 30 2.3 25 1.8 20 1.3 DTM MM pLSA TTM TMM TMM DTM MM pLSA TTM 25 26 Towards and understanding of evolution Integrated into the final prototype With TMM: MM-Plink soon… Link parameter δ j,k : Model (δ j,k ,0) selected < 0.9 0.9 <= < 1.1 1.1 <= 27 28

  8. Outline of the talk Conclusion New models for evoluYonary clustering dedicated to sparse data taking temporal transiYon into account Part I: Representation(s) and Categorization(s) trying to add more interpretaYon of the evoluYon process Part II: Evolutionary Clustering for Sparse Data applied to social media analysis extracYon and monitoring opinionated images Part III: Application to the ImagiWeb Project in close collaboraYon with social sciences Part IV: Conclusion and Future Work joint work with specialists in poliYcal studies and semiologists (all along the process) 29 30 Future work From the methodology point of view: going farther into the interpretaYon process more comparisons needed (see MONIC [Spiliopoulou,2006] for instance) tesYng non-parametric approaches [ THANK YOU ] looking for change points in the Ymeline For the ImagiWeb project: tesYng TMM and MM-Plink on the rest of the available data and 2 nd case study more (qualitaYve) evaluaYon needed qualifying users’ groups using addiYonal variables 31 32

Recommend


More recommend