some remarks on text data visualization and codec
play

Some Remarks on Text Data Visualization and Codec Transparency - PowerPoint PPT Presentation

Some Remarks on Text Data Visualization and Codec Transparency Bryan Jurish jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit at Z urich 17 th July, 2017 Overview Preliminaries p Full Disclosure p


  1. Some Remarks on Text Data Visualization and Codec Transparency Bryan Jurish jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit¨ at Z¨ urich 17 th July, 2017

  2. Overview Preliminaries p Full Disclosure p Terminology: Data, Text, & Visualization Remarks p Pipelines, Parameters, & (visualization) Procedures p Visualizations as Filters p Lossiness, Compression, & ‘Universal’ Filters p ‘Intuitivity’, Exploitation, & Coherence p Co-operation & Codec Transparency Summary 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 1

  3. Full Disclosure p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher (. . . but I played one as an undergraduate) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

  4. Full Disclosure p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher (. . . but I played one as an undergraduate) p . . . I am also an incorrigible Platonist t � ∃ x.x = ∅ t formal (mathematical) objects really exist! t good company: 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

  5. Full Disclosure p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher (. . . but I played one as an undergraduate) p . . . I am also an incorrigible Platonist t � ∃ x.x = ∅ t formal (mathematical) objects really exist! t good company: p Please adjust your interpretative apparatus if and where required t to accommodate my bottomless na¨ ıvet´ e, and/or t according to your own epistemological commitments (or lack thereof) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

  6. Terminology Visualization p an algorithmic procedure by which an underlying data source is transformed to graphical form for direct human consumption p e.g. as a network graph, tag cloud, motion chart, etc. Text Data p a (digital) text corpus, possibly including extralinguistic information such as bibliographic meta-data, document structure, etc. Text Data Visualization p a visualization procedure using a (digital) text corpus as its underlying data source (usually indirectly) Visualization Pipeline p a cascade of algorithmic procedures by which (raw) text data is prepared for and formatted by a particular visualization procedure, including any preprocessing and application-specific modeling 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 3

  7. Remark 1: Pipelines versus Procedures Facts p raw text data itself does not directly support most visualization procedures p each visualization procedure imposes formal constraints on its parameters Claim p (preprocessing) pipelines �⊥ (visualization) procedures p “generic” visualization procedures cannot be clearly distinguished from the preprocessing machinery (“ pipeline ”) which supplies their input Rhetoric p Q : how does one visualize a flat list of unweighted terms as a network graph? A : one doesn’t! (at least not in any meaningful way) p Q : why is Mike Bostock’s D3.js API so mind-bogglingly complex? A : because it needs to be! (“generic” visualization procedures are fictional) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 4

  8. Remark 2: Visualizations ∼ Filters Transmitter Receiver Information Destination (Encoder) (Decoder) Source Received Message Signal Message Signal Noise Source p noisy channel model of communication (Shannon 1948) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  9. Remark 2: Visualizations ∼ Filters Transmitter Receiver Information Destination (Encoder) (Decoder) Source Received Message Signal Message Signal Noise Source p noisy channel model (Shannon 1948) t “ codec ” = encoder ⊕ decoder 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  10. Remark 2: Visualizations ∼ Filters Transmitter Receiver Information Destination (Encoder) (Decoder) Source Received Message Signal Message Signal (Text) (Preprocessing) (Visualization) (User's Eye) Noise Source p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p text data visualization codec (na¨ ıve tinker’s version) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  11. Remark 2: Visualizations ∼ Filters Transmitter Receiver Information Destination (Encoder) (Decoder) Source Received Message Signal Message Signal (Text) (Preprocessing) (Visualization) (User's Eye) Noise Source p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p text data visualization codec (na¨ ıve tinker’s version) � not the whole story! 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  12. Remark 2: Visualizations ∼ Filters p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p natural language is a lossy codec (Reddy 1979) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  13. Remark 2: Visualizations ∼ Filters p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p natural language is a lossy codec (Reddy 1979) p text data visualization is a (lossy) filter 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  14. Remark 2: Visualizations ∼ Filters p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p natural language is a lossy codec (Reddy 1979) p text data visualization is a (lossy) filter � what about the decoder? 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  15. Remark 2: Visualizations ∼ Filters Transmitter Receiver Information Destination (Enco der+Filter) (Filter+Decoder) Source (Author) (User) (NLG) (TxtVis) (optical (interp.) intake) Noise Source (Lossy Compression) p noisy channel model (Shannon 1948) t “codec” = encoder ⊕ decoder p natural language is a lossy codec (Reddy 1979) p text data visualization is a (lossy) filter (transmission side) p reception (interpretation) is filtered too! 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

  16. Remark 3: Lossiness & ‘Universal’ Filters Visualization Pipelines � Lossy Compression p information is lost when messages are passed through the codec t usually by design (we already have the text-encoding) t no lossless formal model of natural language available (yet) ‘Universal’ Filters p as humans, we’re already equipped with a whole bevy of (lossy) filters: t linguistic (minimal attachment, semantic priming) t perceptual (motion detection, color sensitivity) t cognitive (object independence, causal relations) t cultural (common knowledge, conventional signs) Lossiness ∼ ‘Distance’ p lossy filters increase “reading distance” (Moretti 2013) p the communication channel was already fallible 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 6

  17. Remark 4: ‘Intuitivity’ ∼ Exploitation ‘Intuitivity’ p ‘intuitive’ visualizations exploit users’ pre-existing (‘universal’) filters t perceptual � size, motion, color t cognitive � physical simulations, display “objects” t cultural � shared conventional signs p reduced recipient processing load t “progressive disclosure” � conscious focus Exploitation & Coherence p successful exploitation ⇔ coherence of pipeline- & user-filters t all and only relevant information passes unchanged through both codecs t relevance depends on user’s individual research question 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 7

  18. Remark 5: Co-operation � Transparency Co-operation “Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” — Grice (1975) Codec Transparency p no perceptible data loss (e.g. mp3, ogg audio codecs) p visualization � no apprehensible (relevant) data loss Visualization as (co-operative) Communication p Task : maximize transparency � optimize for users’ common research goals p Challenges : t research goals vary widely between users, projects t commonalities can be hard to identify and formally model 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 8

  19. Summary Visualization Procedures p non-modular, interface constraints (preprocessing pipelines) Visualization Pipelines p noisy-channel filters (lossy, usually by design) ‘Universal’ Filters p recipient-internal (perceptual, cognitive, cultural) ‘Intuitivity’ p exploitation of recipient filters (relevance, coherence) Co-operative Communication p maximize codec transparency (minimize apprehensible loss) 2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 9

  20. — The End — schön letzte lieb 6.0 freundlich lächeln danken warm 5.0 glücklich lieb freundschaftlich 4.0 ganz gehorsam jung 3.0 klein persönlich herzlich 2.0 liebenswürdig wirklich gut kurz 1.0 treu 0.0 Thank you for listening! http://kaskade.dwds.de/˜jurish/visihu2017/danke

Recommend


More recommend