word storms multiples of word clouds for visual
play

Word Storms: Multiples of Word Clouds for Visual Comparison of - PowerPoint PPT Presentation

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents Quim Castell, Charles Sutton (WWW-2014) Zoltn Szab Gatsby Unit, Tea Talk Decembert 18, 2014 Zoltn Szab Words Storms Motivation Vast number of documents on


  1. Word Storms: Multiples of Word Clouds for Visual Comparison of Documents Quim Castellá, Charles Sutton (WWW-2014) Zoltán Szabó Gatsby Unit, Tea Talk Decembert 18, 2014 Zoltán Szabó Words Storms

  2. Motivation Vast number of documents on the web. Need for quick scanning. Word clouds (Google: 963.000 hits; LDA - 172.000 hits): One of the most popular generators: Wordle. Font size = frequency of the word. Zoltán Szabó Words Storms

  3. Key Problem Word clouds are difficult to compare visually. Word storm: made of word clouds, word cloud = subset of documents, allows efficient contrasting, comparison of documents. Goal : visualize an entire corpus. Zoltán Szabó Words Storms

  4. Cloud Examples One cloud := one document: comparing individual docs, one track of a conference: ∼ areas, papers from a given period: ∼ time evolution, one scientific field (+its subfield): ∼ hierarchical categories. Zoltán Szabó Words Storms

  5. Guiding Principles Each cloud should represent its own document. 1 Clouds should be easy to compare/contrast. 2 ⇒ Co-occuring words: similar font size, color, position, orientation. Zoltán Szabó Words Storms

  6. Creating a Single Cloud: Notations Word cloud = set of words: W = { w 1 , . . . , w M } . Each word w ∈ W has a position: p w = ( x w , y w ) , font size: s w , color: c w . Importance of a word (=:its weight): tf. W = words with the top M weights. Zoltán Szabó Words Storms

  7. Creating a Single Cloud Font size ∝ word weight. Color, orientation: random. Position: spiral algorithm (next slide). Zoltán Szabó Words Storms

  8. Creating a Single Cloud: Spiral Algorithm Given: word cloud with i − 1 words. New word w to the desired/random location: If no intersection with previous words, and ∈ frame, then goto next word. Else: w is moved outward until a valid position. Zoltán Szabó Words Storms

  9. Spiral Algorithm: Formally Zoltán Szabó Words Storms

  10. Creating a Storm i th document: u i = ( u iw ) : count of word w in the i th doc. i th word cloud: v i = ( W i , { p iw } , { c iw } , { s iw } ) . Alg-1: � � |docs| Color: α -channel = idf = log . |docs containing w | ⇒ transparent: the word appears in many docs. Locations: Initialization: spiral method. Iterate: desired locations := ˆ E clouds [previous locations]. Zoltán Szabó Words Storms

  11. Coordinated Layout: Alg-1 Problem: tends to move words far away from center. Zoltán Szabó Words Storms

  12. Coordinated Layout: Alg-2 – Objective Set of documents: u 1 : N = { u 1 , . . . , u N } . Storm: v 1 : N = { v 1 , . . . , v N } . Objective (how well the storm fits the corpus): N N � � [ d u ( u i , u j ) − d v ( v i , v j )] 2 f u 1 : N ( v 1 : N ) = + c ( u i , v i ) . i , j = 1 i = 1 � �� � � �� � faithful repr. of the own doc similar docs are mapped to similar clouds First term: MDS. d u : Euclidean distance. κ ≥ 0 � � ( s iw − s jw ) 2 + κ � � � 2 d v ( v i , v j ) = � p iw − p jw 2 . w ∈ W i ∪ W j w ∈ W i ∩ W j Second term: � ( u iw − s iw ) 2 . c ( u i , v i ) = w ∈ W i Zoltán Szabó Words Storms

  13. Coordinated Layout: Alg-2 – Objective Two more penalties ( λ > 0, µ > 0): N N � � � � � p iw � 2 O 2 r ( v 1 : N ) = λ + µ . i : w , w ′ 2 i = 1 w , w ′ ∈ W i i = 1 w ∈ W i � �� � � �� � words do not overlap compact configuration O i : w , w ′ : minimum distance required to separate overlapping words ( w , w ′ ). Final objective: f u 1 : N ( v 1 : N ) + r ( v 1 : N ) → min v 1 : N . Optimization: homotopy scheme in λ , fixed subtask: gradient descent. Zoltán Szabó Words Storms

  14. Coordinated Layout: Combined Algorithm Iterative algorithm: fast, but not compact. Gradient method: compact storm, but slow. In practise: combination gives decent results. Zoltán Szabó Words Storms

  15. Numerical Illustration User study: users are better in outlier document detection, the discovery of the two most similar documents. ICML-2012: visualization of sessions, http://icml.cc/2012/whatson-all/ . Research grant abstract visualization (EPSRC): 1 − 5 th = material sciences, 6 th = maths. independent vs. coordinated layout. Zoltán Szabó Words Storms

  16. EPSRC programmes: independent clouds Zoltán Szabó Words Storms

  17. EPSRC programmes: coordinated storm Zoltán Szabó Words Storms

  18. Coordinated Storm: Interpretation (a)-(e) similar: ’material’, ’applications’, ’properties’. Contrast, absence of words: ’coating’ only in (b) and (d), no ’material’ in (f). Informative words (transparency): ’electron’ (a), ’metal’ (b), ’light’ (c), ’crack’ (d), ’composite’ (e), ’problems’ (f). Zoltán Szabó Words Storms

  19. Summary Independent word clouds are difficult to compare. Word storm: Similar clouds represent similar documents. Emphasizes the most informative words. Useful in comparing/contrasting documents. Source code: http://groups.inf.ed.ac.uk/cup/ wordstorm/wordstorm.html Zoltán Szabó Words Storms

Recommend


More recommend